{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cB0MgPvpkP1g"
      },
      "source": [
        "#  Introduction to ProtBERT"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YKYDuyvOHxN6"
      },
      "source": [
        "### Introduction\n",
        "\n",
        "Proteins, the workhorses of cellular functions, are composed of unique sequences of amino acids that fold into intricate three-dimensional structures. These structures determine how proteins interact with other molecules, perform specific tasks within cells, and influence biological processes. Understanding protein sequences and their corresponding structures is fundamental to unraveling the molecular mechanisms underlying diseases, drug interactions, and evolutionary relationships.\n",
        "\n",
        "In bioinformatics, deciphering the language of proteins is crucial for a wide array of applications. For instance, predicting the structure of proteins from their sequences remains one of the grand challenges in computational biology, with implications for drug design and personalized medicine. Furthermore, identifying functional domains within proteins helps researchers comprehend their roles in cellular processes and disease pathways.\n",
        "\n",
        "ProtBERT represents a significant advancement in this field, harnessing the power of BERT (Bidirectional Encoder Representations from Transformers) for protein sequence analysis. By learning contextual representations of amino acid sequences, ProtBERT enables researchers to predict protein functions, classify proteins into different structural categories, and perform other complex tasks with high accuracy.\n",
        "\n",
        "This tutorial introduces ProtBERT within the context of DeepChem, a comprehensive library for deep learning in chemistry and biology. You will learn how to harness ProtBERT's capabilities through pretraining on large-scale protein datasets and fine-tuning for specific tasks like protein classification and prediction. Through hands-on examples and practical exercises, you will gain the skills to apply ProtBERT in your own bioinformatics research, advancing our understanding of proteins and paving the way for groundbreaking discoveries in molecular biology.\n",
        "\n",
        "By the end of this tutorial, you will not only appreciate the importance of protein sequence analysis but also be equipped with the knowledge and tools to leverage ProtBERT effectively, contributing to advancements in biomedical research and therapeutic development.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LeSwVF_QH0e8"
      },
      "source": [
        "### Setup\n",
        "\n",
        "#### Install necessary libraries\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 591
        },
        "id": "TBPgOmcwArax",
        "outputId": "657de630-21a5-4aa1-bc28-7eba1254c56c"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Collecting deepchem\n",
            "  Downloading deepchem-2.8.1.dev20240724182210-py3-none-any.whl.metadata (2.0 kB)\n",
            "Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from deepchem) (1.4.2)\n",
            "Requirement already satisfied: numpy<2 in /usr/local/lib/python3.10/dist-packages (from deepchem) (1.25.2)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from deepchem) (2.0.3)\n",
            "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from deepchem) (1.2.2)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from deepchem) (1.13.1)\n",
            "Requirement already satisfied: scipy>=1.10.1 in /usr/local/lib/python3.10/dist-packages (from deepchem) (1.11.4)\n",
            "Collecting rdkit (from deepchem)\n",
            "  Downloading rdkit-2024.3.3-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.9 kB)\n",
            "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->deepchem) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->deepchem) (2023.4)\n",
            "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->deepchem) (2024.1)\n",
            "Requirement already satisfied: Pillow in /usr/local/lib/python3.10/dist-packages (from rdkit->deepchem) (9.4.0)\n",
            "Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->deepchem) (3.5.0)\n",
            "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->deepchem) (1.3.0)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->deepchem) (1.16.0)\n",
            "Downloading deepchem-2.8.1.dev20240724182210-py3-none-any.whl (1.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m16.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading rdkit-2024.3.3-cp310-cp310-manylinux_2_28_x86_64.whl (33.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m33.1/33.1 MB\u001b[0m \u001b[31m14.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hInstalling collected packages: rdkit, deepchem\n",
            "Successfully installed deepchem-2.8.1.dev20240724182210 rdkit-2024.3.3\n"
          ]
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "WARNING:deepchem.feat.molecule_featurizers.rdkit_descriptors:No normalization for SPS. Feature removed!\n",
            "WARNING:deepchem.feat.molecule_featurizers.rdkit_descriptors:No normalization for AvgIpc. Feature removed!\n",
            "WARNING:tensorflow:From /usr/local/lib/python3.10/dist-packages/tensorflow/python/util/deprecation.py:588: calling function (from tensorflow.python.eager.polymorphic_function.polymorphic_function) with experimental_relax_shapes is deprecated and will be removed in a future version.\n",
            "Instructions for updating:\n",
            "experimental_relax_shapes is deprecated, use reduce_retracing instead\n",
            "WARNING:deepchem.models.torch_models:Skipped loading modules with pytorch-geometric dependency, missing a dependency. No module named 'torch_geometric'\n",
            "WARNING:deepchem.models:Skipped loading modules with pytorch-geometric dependency, missing a dependency. cannot import name 'DMPNN' from 'deepchem.models.torch_models' (/usr/local/lib/python3.10/dist-packages/deepchem/models/torch_models/__init__.py)\n",
            "WARNING:deepchem.models:Skipped loading modules with pytorch-lightning dependency, missing a dependency. No module named 'lightning'\n",
            "WARNING:deepchem.models:Skipped loading some Jax models, missing a dependency. No module named 'haiku'\n"
          ]
        },
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'2.8.1.dev'"
            ]
          },
          "execution_count": 1,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "!pip install --pre deepchem\n",
        "import deepchem\n",
        "deepchem.__version__"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kTlSEuCVwjgh"
      },
      "source": [
        "## Import libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "WiEbYejmwgT2"
      },
      "outputs": [],
      "source": [
        "import deepchem as dc\n",
        "import pandas as pd\n",
        "from deepchem.models.torch_models import ProtBERT\n",
        "import torch.nn as nn\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-lfH_KEEfoL_"
      },
      "source": [
        "## Dataset\n",
        "\n",
        "We will be using DeepLoc [1] dataset to finetune ProtBERT. Specfically we will finetune ProtBERT to predict the water solublity of proteins from the DeepLoc dataset.\n",
        "\n",
        "References:\n",
        "\n",
        "[1] Almagro Armenteros, José Juan, et al. \"DeepLoc: prediction of protein subcellular localization using deep learning.\" Bioinformatics 33.21 (2017): 3387-3395."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "YdthYoD4f1PY",
        "outputId": "b4e94816-6b39-4a24-919f-e621e769e804"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "--2024-07-26 14:56:03--  https://deepchemdata.s3.us-west-1.amazonaws.com/datasets/DeepLoc_test.csv\n",
            "Resolving deepchemdata.s3.us-west-1.amazonaws.com (deepchemdata.s3.us-west-1.amazonaws.com)... 52.219.192.82, 3.5.163.155, 3.5.163.14, ...\n",
            "Connecting to deepchemdata.s3.us-west-1.amazonaws.com (deepchemdata.s3.us-west-1.amazonaws.com)|52.219.192.82|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 32214944 (31M) [text/csv]\n",
            "Saving to: ‘/content/datasets/DeepLoc_test.csv’\n",
            "\n",
            "DeepLoc_test.csv    100%[===================>]  30.72M  47.4MB/s    in 0.6s    \n",
            "\n",
            "2024-07-26 14:56:04 (47.4 MB/s) - ‘/content/datasets/DeepLoc_test.csv’ saved [32214944/32214944]\n",
            "\n",
            "--2024-07-26 14:56:04--  https://deepchemdata.s3.us-west-1.amazonaws.com/datasets/DeepLoc_train.csv\n",
            "Resolving deepchemdata.s3.us-west-1.amazonaws.com (deepchemdata.s3.us-west-1.amazonaws.com)... 52.219.112.193, 52.219.113.130, 52.219.192.66, ...\n",
            "Connecting to deepchemdata.s3.us-west-1.amazonaws.com (deepchemdata.s3.us-west-1.amazonaws.com)|52.219.112.193|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 107467465 (102M) [text/csv]\n",
            "Saving to: ‘/content/datasets/DeepLoc_train.csv’\n",
            "\n",
            "DeepLoc_train.csv   100%[===================>] 102.49M  45.3MB/s    in 2.3s    \n",
            "\n",
            "2024-07-26 14:56:07 (45.3 MB/s) - ‘/content/datasets/DeepLoc_train.csv’ saved [107467465/107467465]\n",
            "\n"
          ]
        }
      ],
      "source": [
        "!wget -P \"/content/datasets\" \"https://deepchemdata.s3.us-west-1.amazonaws.com/datasets/DeepLoc_test.csv\"\n",
        "!wget -P \"/content/datasets\" \"https://deepchemdata.s3.us-west-1.amazonaws.com/datasets/DeepLoc_train.csv\""
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "ZgcUk7tphe0o"
      },
      "outputs": [],
      "source": [
        "\n",
        "# For demo purpose we choose a subset of the orginal data\n",
        "train_df = pd.read_csv(\"/content/datasets/DeepLoc_train.csv\")\n",
        "string_lengths = train_df[\"protein\"].apply(len)\n",
        "filtered_train_df = train_df[string_lengths < 200].sample(5000)\n",
        "filtered_train_df.to_csv(\"/content/datasets/DeepLoc_train_5000.csv\",index=False)\n",
        "\n",
        "\n",
        "test_df = pd.read_csv(\"/content/datasets/DeepLoc_test.csv\")\n",
        "string_lengths = test_df[\"protein\"].apply(len)\n",
        "filtered_test_df = test_df[string_lengths < 200].sample(1000)\n",
        "filtered_test_df.to_csv(\"/content/datasets/DeepLoc_test_1000.csv\",index=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "VMt1T2_EzGvw",
        "outputId": "da82e007-8195-4714-cb88-9e7b81ebd4b0"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "summary": "{\n  \"name\": \"filtered_train_df\",\n  \"rows\": 5000,\n  \"fields\": [\n    {\n      \"column\": \"Compound ID\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 743,\n        \"samples\": [\n          \"Q2I2Q5\",\n          \"P12101\",\n          \"P85216\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"protein\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 726,\n        \"samples\": [\n          \"A T C D L L S G T G I K H S A C A A H C L L R G N R G G Y C N G R A I C V C R N\",\n          \"M K T L L L T L V V V T I V C L D L G N S L I C Y N T M M Q K V T C P E G K D K C E K Y A V P V M R G K F Y F S Y Q C T S K C H E G A Y D V C C S T D L C N K S S T S G\",\n          \"M K V T L I A I L T C A A V L V L H T T A A E E L E A E S Q L M E V G M P D T E L A A V D E E R L F E C S F S C E I E K E G D K P C K K K K C K G G W K C K F N M C V K V\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"subcellular localization\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 9,\n        \"num_unique_values\": 10,\n        \"samples\": [\n          8,\n          7,\n          4\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"water soluble\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 1,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
              "type": "dataframe",
              "variable_name": "filtered_train_df"
            },
            "text/html": [
              "\n",
              "  <div id=\"df-6fbc033c-5934-4828-9bbb-f4e95bfb6073\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Compound ID</th>\n",
              "      <th>protein</th>\n",
              "      <th>subcellular localization</th>\n",
              "      <th>water soluble</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>24682</th>\n",
              "      <td>P0DL40</td>\n",
              "      <td>M H P I I W E L S H M V D L Q A A A Q K L K R ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>11593</th>\n",
              "      <td>P85506</td>\n",
              "      <td>G C I P S F G E C A W F S G E S C C T G I C K ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>50880</th>\n",
              "      <td>Q5UFR8</td>\n",
              "      <td>M K T L L L S P V V V T I V C L D L G Y T M T ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>73047</th>\n",
              "      <td>Q69CK0</td>\n",
              "      <td>M K T L L L T L V V V T I V C L D L G Y T R K ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>64672</th>\n",
              "      <td>P0CAY6</td>\n",
              "      <td>M K C P S I F L T L L I F V S S C T S I L I N ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-6fbc033c-5934-4828-9bbb-f4e95bfb6073')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-6fbc033c-5934-4828-9bbb-f4e95bfb6073 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-6fbc033c-5934-4828-9bbb-f4e95bfb6073');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-526d0ebc-3d57-4701-a43d-d0abc732e06e\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-526d0ebc-3d57-4701-a43d-d0abc732e06e')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-526d0ebc-3d57-4701-a43d-d0abc732e06e button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "      Compound ID                                            protein  \\\n",
              "24682      P0DL40  M H P I I W E L S H M V D L Q A A A Q K L K R ...   \n",
              "11593      P85506  G C I P S F G E C A W F S G E S C C T G I C K ...   \n",
              "50880      Q5UFR8  M K T L L L S P V V V T I V C L D L G Y T M T ...   \n",
              "73047      Q69CK0  M K T L L L T L V V V T I V C L D L G Y T R K ...   \n",
              "64672      P0CAY6  M K C P S I F L T L L I F V S S C T S I L I N ...   \n",
              "\n",
              "       subcellular localization  water soluble  \n",
              "24682                         3              1  \n",
              "11593                         3              1  \n",
              "50880                         3              1  \n",
              "73047                         3              1  \n",
              "64672                         3              1  "
            ]
          },
          "execution_count": 5,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "filtered_train_df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "ZI3_SKeUzUsR",
        "outputId": "03ef8222-0556-409d-880f-624689d8a993"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "summary": "{\n  \"name\": \"filtered_test_df\",\n  \"rows\": 1000,\n  \"fields\": [\n    {\n      \"column\": \"Compound ID\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 138,\n        \"samples\": [\n          \"P86994\",\n          \"P85071\",\n          \"P0C1Y9\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"protein\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 138,\n        \"samples\": [\n          \"M L L L S A V K T L L L A W L G I V L V F M S I I K S A M L D F L Q E A G K L E G I E T Y K K E A Q T S F M A P S W A L G H L M G R K\",\n          \"M F T L K K S Q L L L F F P G T I N L S L C Q D E T N A E E E R R D E E V A K M E E I K R G L L S G I L G A G K H I V C G L S G L C\",\n          \"T M C Y S H T T T S R A I L T N C G E N S C Y R K S R R H P P K M V L G R G C G C P P G D D Y L E V K C C T S P D K C N Y\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"subcellular localization\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 2,\n        \"min\": 1,\n        \"max\": 9,\n        \"num_unique_values\": 7,\n        \"samples\": [\n          3,\n          9,\n          5\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"water soluble\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 1,\n        \"num_unique_values\": 2,\n        \"samples\": [\n          0,\n          1\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}",
              "type": "dataframe",
              "variable_name": "filtered_test_df"
            },
            "text/html": [
              "\n",
              "  <div id=\"df-e2698a62-01d1-4ab3-b4df-f0a0e0681d0c\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Compound ID</th>\n",
              "      <th>protein</th>\n",
              "      <th>subcellular localization</th>\n",
              "      <th>water soluble</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>17649</th>\n",
              "      <td>P0C5W7</td>\n",
              "      <td>M F T L K K S L L L L F F L G T I S L S L C E ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>30909</th>\n",
              "      <td>P00295</td>\n",
              "      <td>L D V L L G S D D G E L A F V P N N F S V P S ...</td>\n",
              "      <td>9</td>\n",
              "      <td>0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>25710</th>\n",
              "      <td>B3EWT2</td>\n",
              "      <td>G C I P K H K R C T W S G P K C C N N I S C H ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>23850</th>\n",
              "      <td>C9JLW8</td>\n",
              "      <td>M T S S P V S R V V Y N G K R T S S P R S P P ...</td>\n",
              "      <td>7</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>31290</th>\n",
              "      <td>C5J888</td>\n",
              "      <td>M Q Y K T F L V I S L A Y L L V A D E A A A F ...</td>\n",
              "      <td>3</td>\n",
              "      <td>1</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e2698a62-01d1-4ab3-b4df-f0a0e0681d0c')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-e2698a62-01d1-4ab3-b4df-f0a0e0681d0c button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-e2698a62-01d1-4ab3-b4df-f0a0e0681d0c');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-d43545ce-0e55-417d-a8c3-c95e5ca47b52\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-d43545ce-0e55-417d-a8c3-c95e5ca47b52')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-d43545ce-0e55-417d-a8c3-c95e5ca47b52 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "text/plain": [
              "      Compound ID                                            protein  \\\n",
              "17649      P0C5W7  M F T L K K S L L L L F F L G T I S L S L C E ...   \n",
              "30909      P00295  L D V L L G S D D G E L A F V P N N F S V P S ...   \n",
              "25710      B3EWT2  G C I P K H K R C T W S G P K C C N N I S C H ...   \n",
              "23850      C9JLW8  M T S S P V S R V V Y N G K R T S S P R S P P ...   \n",
              "31290      C5J888  M Q Y K T F L V I S L A Y L L V A D E A A A F ...   \n",
              "\n",
              "       subcellular localization  water soluble  \n",
              "17649                         3              1  \n",
              "30909                         9              0  \n",
              "25710                         3              1  \n",
              "23850                         7              1  \n",
              "31290                         3              1  "
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "filtered_test_df.head()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "ot0GE6Sygm3Y"
      },
      "outputs": [],
      "source": [
        "featurizer = dc.feat.DummyFeaturizer()\n",
        "tasks = [\"water soluble\"]\n",
        "loader = dc.data.CSVLoader(tasks=tasks,\n",
        "                            feature_field=\"protein\",\n",
        "                            featurizer=featurizer)\n",
        "deeploc_train_dataset = loader.create_dataset(\"/content/datasets/DeepLoc_train_5000.csv\")\n",
        "deeploc_test_dataset  = loader.create_dataset(\"/content/datasets/DeepLoc_test_1000.csv\")\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-wGOiIxmH_gT"
      },
      "source": [
        "### Understanding ProtBERT\n",
        "\n",
        "ProtBERT is a specialized variant of the BERT (Bidirectional Encoder Representations from Transformers) model, specifically designed for processing protein sequences. Developed by researchers at the Rostlab, ProtBERT leverages the transformative capabilities of BERT to encode the complex and nuanced features present in amino acid sequences.\n",
        "\n",
        "#### Key Features of ProtBERT:\n",
        "\n",
        "1. **BERT Architecture Adaptation:** ProtBERT adapts the original BERT architecture to the unique characteristics of protein sequences. It consists of transformer layers that capture both local and global dependencies in the sequence, making it suitable for tasks ranging from masked language modeling (MLM) to sequence classification.\n",
        "\n",
        "2. **Tokenization and Embedding:** Similar to how BERT tokenizes words in natural language, ProtBERT tokenizes amino acids in protein sequences. It uses a specialized tokenizer trained on large protein sequence databases, enabling it to generate embeddings that capture the semantic meaning and context of amino acids.\n",
        "\n",
        "3. **Pretraining and Fine-tuning:** ProtBERT supports pretraining on large-scale protein datasets such as UniRef100 and BFD (Baker's finite difference), which helps it learn representations that generalize well across diverse protein sequences. These pretrained models can then be fine-tuned for specific tasks like protein classification (e.g., predicting membrane proteins or subcellular localization). The authors first pretrain on protein sequences with lengths less than 512, then on sequences less than 1024, and finally on sequences up to 40,000.\n",
        "4. **Task-specific Adaptation:** Depending on the task, ProtBERT can be adapted with different classifier heads. For instance, it can be configured for single-label or multi-label classification tasks, allowing researchers to tailor it to specific biological questions.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mSBz1nIgIKAD"
      },
      "source": [
        "![ProtBERT.png]()\n",
        "\n",
        "Image reference: Elnaggar, Ahmed, et al. \"Prottrans: Toward understanding the language of life through self-supervised learning.\" IEEE transactions on pattern analysis and machine intelligence 44.10 (2021): 7112-7127."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hJ-UzTAiIYgR"
      },
      "source": [
        "### Loading ProtBERT\n",
        "\n",
        "ProtBERT comes pretrained with models specifically trained on the Uniref100 and BFD datasets. These pretrained models are available for both Masked Language Modeling (MLM) and Sequence Classification tasks. This section covers how to load ProtBERT in different modes and provides details about the pretrained datasets available.\n",
        "\n",
        "#### Pretrained Models:\n",
        "\n",
        "1. **Uniref100 Model:**\n",
        "   - **Description:** The Uniref100 model is pretrained on the Uniref100 [1] dataset.\n",
        "   - **Usage:** Initialize ProtBERT with `model_path = 'Rostlab/prot_bert'` to load the Uniref100 pretrained model.\n",
        "\n",
        "2. **BFD Model:**\n",
        "   - **Description:** The BFD model is pretrained on the BFD(Big Fantastic Database) dataset [2][3].\n",
        "   - **Usage:** Initialize ProtBERT with `model_path = 'Rostlab/prot_bert_bfd'` to load the BFD pretrained model.\n",
        "\n",
        "#### Supported Modes:\n",
        "\n",
        "1. **Masked Language Modeling (MLM):**\n",
        "   - **Description:** ProtBERT learns to predict masked amino acids in protein sequences, facilitating a deeper understanding of amino acid relationships and sequence contexts.\n",
        "   - **Usage:** Initialize ProtBERT with `task='mlm'` and specify either `model_path = 'Rostlab/prot_bert'` or `model_path = 'Rostlab/prot_bert_bfd'` for MLM tasks.\n",
        "\n",
        "2. **Sequence Classification:**\n",
        "   - **Description:** Enables classification tasks such as predicting membrane proteins, subcellular localization, or custom classifications using a user-defined classifier head.\n",
        "   - **Usage:** Set `task='classification'` to utilize ProtBERT for sequence classification. Specify the `cls_name` parameter as 'LogReg', 'FFN', or 'custom' to use Logistic Regression, a simple 1-layer FFN, or a custom classifier network, respectively.\n",
        "     - **Custom Task:** Set `cls_name='custom'` and provide a custom classifier head using the `classifier_net` argument. This allows users to apply a custom classifier head on top of the pretrained ProtBERT model.\n",
        "\n",
        "References:\n",
        "\n",
        "[1] Suzek, Baris E., et al. \"UniRef: comprehensive and non-redundant UniProt reference clusters.\" Bioinformatics 23.10 (2007): 1282-1288.\n",
        "\n",
        "[2] Steinegger, Martin, Milot Mirdita, and Johannes Söding. \"Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold.\" Nature methods 16.7 (2019): 603-606.\n",
        "\n",
        "[3] Steinegger, Martin, and Johannes Söding. \"Clustering huge protein sequence sets in linear time.\" Nature communications 9.1 (2018): 2542."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NokdQR09fBr6"
      },
      "source": [
        "## Importing Model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "A_5BJKvpfDmV",
        "outputId": "8fdc0207-5033-4aee-e59f-239020756df5"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "Some weights of BertForSequenceClassification were not initialized from the model checkpoint at Rostlab/prot_bert and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
            "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
          ]
        }
      ],
      "source": [
        "finetune_model_dir = \"finetuning/\"\n",
        "\n",
        "custom_network = nn.Sequential(nn.Linear(1024, 512),\n",
        "                               nn.ReLU(), nn.Linear(512, 256),\n",
        "                               nn.ReLU(), nn.Linear(256, 2)) # Network for custom classfication task\n",
        "\n",
        "ProtBERTmodel_for_classification = ProtBERT(task='classification',\n",
        "                                            model_path=\"Rostlab/prot_bert\",\n",
        "                                            n_tasks=1,\n",
        "                                            cls_name=\"custom\",\n",
        "                                            classifier_net=custom_network,\n",
        "                                            n_classes=2,\n",
        "                                            model_dir=finetune_model_dir,\n",
        "                                            batch_size=32,\n",
        "                                            learning_rate=1e-5,\n",
        "                                            log_frequency = 5) # ProtBERT model that can be used for fine-tuning for a downstream task\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "jjRSqdmpJuTc"
      },
      "source": [
        "### Fine-Tuning ProtBERT\n",
        "\n",
        "Fine-tuning ProtBERT involves adapting the pretrained model to specific tasks or datasets, such as protein classification, subcellular localization prediction, or custom tasks. This process is crucial because it leverages the knowledge gained during pretraining on large datasets like Uniref100 or BFD. By fine-tuning, researchers can enhance ProtBERT's performance on targeted tasks without starting from scratch, thus accelerating model deployment and improving accuracy in biomedical applications.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "Lwgscj4AxS0X"
      },
      "outputs": [],
      "source": [
        "# Freeze underlying ProtBERT and only train the classfier head\n",
        "for param in ProtBERTmodel_for_classification.model.bert.parameters():\n",
        "    param.requires_grad = False\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "id": "2ekPpvPMxZXf"
      },
      "outputs": [],
      "source": [
        "all_losses = []\n",
        "loss = ProtBERTmodel_for_classification.fit(deeploc_train_dataset, nb_epoch=1,all_losses = all_losses)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 472
        },
        "id": "5CM0xCB6l3Ds",
        "outputId": "47e95309-5aa1-44ed-d70f-9ae3d79788b1"
      },
      "outputs": [
        {
          "data": {
            "image/png": "iVBORw0KGgoAAAANSUhEUgAAAkAAAAHHCAYAAABXx+fLAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABpz0lEQVR4nO3de3zO9f/H8ce182ZmGNucco7IIcKodBgTOaWcj4lySCyH1NdhiJKQEqVEyqEkynnG5Kwc0kFylsNIYWZss31+f3x+u5gN22y7tut63m+369bn+nzen/f1eu3SvHw+7/fnbTEMw0BERETEgTjZOgARERGRnKYCSERERByOCiARERFxOCqARERExOGoABIRERGHowJIREREHI4KIBEREXE4KoBERETE4agAEhEREYejAkgkl+vevTulS5fO1LmjR4/GYrFkbUAi/2/OnDlYLBZ+/vlnW4cikmEqgEQyyWKxpOsVGRlp61Btonv37nh7e9s6DLuRXGzc/CpatChPPPEEq1atynS/48ePZ+nSpVkXqEge4WLrAETyqnnz5qV4/8UXXxAeHp5qf+XKle/pc2bNmkVSUlKmzv3f//7H66+/fk+fL7nLmDFjKFOmDIZhcPbsWebMmUPTpk354YcfeOaZZzLc3/jx43nuuedo1apV1gcrkoupABLJpM6dO6d4v337dsLDw1Ptv1VsbCxeXl7p/hxXV9dMxQfg4uKCi4v+N88rrly5Qr58+e7Y5umnn6Z27drW9z179sTf358FCxZkqgAScVS6BSaSjR5//HGqVq3Krl27eOyxx/Dy8uKNN94AYNmyZTRr1oxixYrh7u5OuXLlGDt2LImJiSn6uHUM0LFjx7BYLEyaNIlPPvmEcuXK4e7uzsMPP8xPP/2U4ty0xgBZLBb69+/P0qVLqVq1Ku7u7lSpUoXVq1enij8yMpLatWvj4eFBuXLl+Pjjj7N8XNE333xDrVq18PT0xM/Pj86dO3Pq1KkUbaKioujRowclSpTA3d2dwMBAWrZsybFjx6xtfv75Z0JCQvDz88PT05MyZcrwwgsvpCuGjz76iCpVquDu7k6xYsXo168fFy9etB7v378/3t7exMbGpjq3Q4cOBAQEpPjeVq1axaOPPkq+fPnInz8/zZo14/fff09xXvItwsOHD9O0aVPy589Pp06d0hXvzXx9ffH09ExV6E6aNIn69etTuHBhPD09qVWrFosXL07RxmKxcOXKFebOnWu9rda9e3fr8VOnTtGzZ0/rn9EyZcrQp08f4uPjU/QTFxdHaGgoRYoUIV++fLRu3Zp//vknVazp+bmk57sWyQr6p6FINvv33395+umnad++PZ07d8bf3x8wx3R4e3sTGhqKt7c369evZ+TIkURHR/Puu+/etd/58+dz+fJlXnrpJSwWCxMnTuTZZ5/lyJEjd71qtHnzZpYsWULfvn3Jnz8/06ZNo02bNpw4cYLChQsDsGfPHpo0aUJgYCBhYWEkJiYyZswYihQpcu8/lP83Z84cevTowcMPP8yECRM4e/Ys77//Plu2bGHPnj34+voC0KZNG37//XdeeeUVSpcuzblz5wgPD+fEiRPW940bN6ZIkSK8/vrr+Pr6cuzYMZYsWXLXGEaPHk1YWBjBwcH06dOHAwcOMGPGDH766Se2bNmCq6sr7dq1Y/r06axYsYLnn3/eem5sbCw//PAD3bt3x9nZGTBvjXbr1o2QkBDeeecdYmNjmTFjBo888gh79uxJUcxev36dkJAQHnnkESZNmpSuK4OXLl3i/PnzGIbBuXPn+OCDD4iJiUl15fH999+nRYsWdOrUifj4eBYuXMjzzz/P8uXLadasmTXWF198kTp16tC7d28AypUrB8Dp06epU6cOFy9epHfv3lSqVIlTp06xePFiYmNjcXNzs37WK6+8QsGCBRk1ahTHjh1j6tSp9O/fn0WLFlnbpPfncrfvWiTLGCKSJfr162fc+r9Uw4YNDcCYOXNmqvaxsbGp9r300kuGl5eXce3aNeu+bt26Gffdd5/1/dGjRw3AKFy4sPHff/9Z9y9btswAjB9++MG6b9SoUaliAgw3Nzfj0KFD1n2//PKLARgffPCBdV/z5s0NLy8v49SpU9Z9Bw8eNFxcXFL1mZZu3boZ+fLlu+3x+Ph4o2jRokbVqlWNq1evWvcvX77cAIyRI0cahmEYFy5cMADj3XffvW1f3333nQEYP/30013jutm5c+cMNzc3o3HjxkZiYqJ1/4cffmgAxuzZsw3DMIykpCSjePHiRps2bVKc//XXXxuA8eOPPxqGYRiXL182fH19jV69eqVoFxUVZRQoUCDF/m7duhmA8frrr6cr1s8//9wAUr3c3d2NOXPmpGp/65+v+Ph4o2rVqsaTTz6ZYn++fPmMbt26pTq/a9euhpOTU5o/06SkpBQxBQcHW/cZhmEMGjTIcHZ2Ni5evGgYRvp/Lun5rkWyim6BiWQzd3d3evTokWq/p6endfvy5cucP3+eRx99lNjYWP7888+79tuuXTsKFixoff/oo48CcOTIkbueGxwcbP2XPkC1atXw8fGxnpuYmMi6deto1aoVxYoVs7YrX748Tz/99F37T4+ff/6Zc+fO0bdvXzw8PKz7mzVrRqVKlVixYgVg/pzc3NyIjIzkwoULafaVfKVo+fLlJCQkpDuGdevWER8fz8CBA3FyuvHrsFevXvj4+FhjsFgsPP/886xcuZKYmBhru0WLFlG8eHEeeeQRAMLDw7l48SIdOnTg/Pnz1pezszN169Zlw4YNqWLo06dPuuMFmD59OuHh4YSHh/Pll1/yxBNP8OKLL6a62nXzn68LFy5w6dIlHn30UXbv3n3Xz0hKSmLp0qU0b948xXijZLfeAu3du3eKfY8++iiJiYkcP34cSP/PJT3ftUhWUQEkks2KFy+e4nZBst9//53WrVtToEABfHx8KFKkiPU2xqVLl+7ab6lSpVK8Ty6G0vMXx63nJp+ffO65c+e4evUq5cuXT9UurX2ZkfyX4/3335/qWKVKlazH3d3deeedd1i1ahX+/v489thjTJw4kaioKGv7hg0b0qZNG8LCwvDz86Nly5Z8/vnnxMXFZSoGNzc3ypYtaz0OZsF59epVvv/+ewBiYmJYuXIlzz//vPUv/4MHDwLw5JNPUqRIkRSvtWvXcu7cuRSf4+LiQokSJe7+w7pJnTp1CA4OJjg4mE6dOrFixQoeeOAB+vfvn2JszvLly6lXrx4eHh4UKlSIIkWKMGPGjHT92frnn3+Ijo6matWq6Yrpbn8W0/tzSc93LZJVNAZIJJvd/C/xZBcvXqRhw4b4+PgwZswYypUrh4eHB7t372bYsGHpmvaePObkVoZhZOu5tjBw4ECaN2/O0qVLWbNmDSNGjGDChAmsX7+emjVrYrFYWLx4Mdu3b+eHH35gzZo1vPDCC7z33nts3749S55HVK9ePUqXLs3XX39Nx44d+eGHH7h69Srt2rWztkn+3ubNm0dAQECqPm4dqOzu7p7iylNmODk58cQTT/D+++9z8OBBqlSpwqZNm2jRogWPPfYYH330EYGBgbi6uvL5558zf/78e/q8tNztz1NGfi53+65FsooKIBEbiIyM5N9//2XJkiU89thj1v1Hjx61YVQ3FC1aFA8PDw4dOpTqWFr7MuO+++4D4MCBAzz55JMpjh04cMB6PFm5cuV47bXXeO211zh48CA1atTgvffe48svv7S2qVevHvXq1eOtt95i/vz5dOrUiYULF/Liiy/eNYayZcta98fHx3P06FGCg4NTtG/bti3vv/8+0dHRLFq0iNKlS1OvXr0UMYL587v13Ox0/fp1AOvtuW+//RYPDw/WrFmDu7u7td3nn3+e6ty0ZvQVKVIEHx8ffvvttyyJL6M/l/R81yL3SrfARGwg+V/MN19xiY+P56OPPrJVSCk4OzsTHBzM0qVLOX36tHX/oUOH7umpwzerXbs2RYsWZebMmSluVa1atYr9+/dbZyrFxsZy7dq1FOeWK1eO/PnzW8+7cOFCqqtXNWrUALjjbbDg4GDc3NyYNm1aivM/++wzLl26ZI0hWbt27YiLi2Pu3LmsXr2atm3bpjgeEhKCj48P48ePT3MsUlpTw+9VQkICa9euxc3NzfrQTWdnZywWS4qp+ceOHUvzic/58uVLMeUfzKtKrVq14ocffkhzmYuMXilM788lPd+1SFbRFSARG6hfvz4FCxakW7duDBgwAIvFwrx583LVLajRo0ezdu1aGjRoQJ8+fUhMTOTDDz+katWq7N27N119JCQkMG7cuFT7CxUqRN++fXnnnXfo0aMHDRs2pEOHDtZp8KVLl2bQoEEA/PXXXzz11FO0bduWBx54ABcXF7777jvOnj1L+/btAZg7dy4fffQRrVu3ply5cly+fJlZs2bh4+ND06ZNbxtfkSJFGD58OGFhYTRp0oQWLVpw4MABPvroIx5++OFUU8sfeughypcvz5tvvklcXFyK218APj4+zJgxgy5duvDQQw/Rvn17ihQpwokTJ1ixYgUNGjTgww8/TNfP7nZWrVplHSR/7tw55s+fz8GDB3n99dfx8fEBzIHkkydPpkmTJnTs2JFz584xffp0ypcvz759+1L0V6tWLdatW8fkyZMpVqwYZcqUoW7duowfP561a9fSsGFDevfuTeXKlTlz5gzffPMNmzdvtg48T4/0/lzS812LZBnbTUATsS+3mwZfpUqVNNtv2bLFqFevnuHp6WkUK1bMGDp0qLFmzRoDMDZs2GBtd7tp8GlNFQaMUaNGWd/fbhp8v379Up173333pZoOHRERYdSsWdNwc3MzypUrZ3z66afGa6+9Znh4eNzmp3BD8jTvtF7lypWztlu0aJFRs2ZNw93d3ShUqJDRqVMn4+TJk9bj58+fN/r162dUqlTJyJcvn1GgQAGjbt26xtdff21ts3v3bqNDhw5GqVKlDHd3d6No0aLGM888Y/z88893jdMwzGnvlSpVMlxdXQ1/f3+jT58+xoULF9Js++abbxqAUb58+dv2t2HDBiMkJMQoUKCA4eHhYZQrV87o3r17inju9piAW6U1Dd7Dw8OoUaOGMWPGjBTT0A3DMD777DOjQoUKhru7u1GpUiXj888/T/PPw59//mk89thjhqenpwGk+DNw/Phxo2vXrkaRIkUMd3d3o2zZska/fv2MuLi4FDHdOlV+w4YNqf4cp+fnkp7vWiSrWAwjF/2TU0RyvVatWvH7779bZ/aIiORFGgMkIrd19erVFO8PHjzIypUrefzxx20TkIhIFtEVIBG5rcDAQLp37259Js6MGTOIi4tjz549VKhQwdbhiYhkmgZBi8htNWnShAULFhAVFYW7uztBQUGMHz9exY+I5Hm6AiQiIiIOR2OARERExOGoABIRERGHozFAaUhKSuL06dPkz58/zcfEi4iISO5jGAaXL1+mWLFid11nTwVQGk6fPk3JkiVtHYaIiIhkwt9//02JEiXu2EYFUBry588PmD/A5EfL3yx57Z3GjRvj6uqa0+HlGOVpfxwlV+VpX5SnfcnOPKOjoylZsqT17/E7UQGUhuTbXj4+PrctgLy8vPDx8bH7P6TK0744Sq7K074oT/uSE3mmZ/iKBkGLiIiIw1EBJCIiIg5HBZCIiIg4HBVAIiIi4nBUAImIiIjDUQEkIiIiDkcFkIiIiDgcFUAiIiLicFQAiYiIiMNRASQiIiIORwWQiIiIOBwVQCIiIuJwVADlsBUr4Pp1W0chIiLi2FQA5aCwMHjmGejfHwzD1tGIiIg4LhVAOejBB8FigY8/hrfesnU0IiIijksFUA569lmYNs3cHjECPv/ctvGIiIg4KhVAOax/f3j9dXO7Vy9YudK28YiIiDgiFUA2MH48dO0KiYnw/POwc6etIxIREXEsKoBswGKBTz+FkBCIjYVmzeDQIVtHJSIi4jhUANmIqyt88w3UqgXnz5vF0Nmzto5KRETEMagAsqH8+c3nApUtC0eOmFeCYmJsHZWIiIj9UwFkY/7+sHo1+PnBrl3w3HOQkGDrqEREROybCqBcoEIF80qQlxesWQMvvqgHJYqIiGQnFUC5RJ065pggZ2f44gv43/9sHZGIiIj9UgGUizRtCp98Ym6PHw8ffWTbeEREROyVCqBc5oUXYMwYc7t/f/juO9vGIyIiYo9UAOVC//sfvPSSOQ6oQwfYvNnWEYmIiNgXFUC5kMUC06dDy5YQFwfNm8Mff9g6KhEREfuhAiiXcnaG+fMhKAguXoQmTeDUKVtHJSIiYh9UAOViXl7www9QqRL8/Tc8/TRER9s6KhERkbxPBVAuV7iw+aDEwED49Vfo0gWSkmwdlYiISN6mAigPuO8+WLYM3N3h++9h1ChbRyQiIpK3qQDKIx5+2FxBHmDcOPOhiSIiIpI5KoDykM6dYfBgc7t7d9i715bRiIiI5F0qgPKYt982Z4TFxprT5M+ds3VEIiIieY8KoDwmeXp8hQpw4oS5enx8vK2jEhERyVtUAOVBBQuag6F9fGDTJhgwwNYRiYiI5C02L4CmT59O6dKl8fDwoG7duuzcufOO7S9evEi/fv0IDAzE3d2dihUrsnLlSuvx0aNHY7FYUrwqVaqU3WnkuEqVYMEC86nRH38MM2bYOiIREZG8w6YF0KJFiwgNDWXUqFHs3r2b6tWrExISwrnbDGyJj4+nUaNGHDt2jMWLF3PgwAFmzZpF8eLFU7SrUqUKZ86csb422+liWk2bwoQJ5vaAAbBxo23jERERyStcbPnhkydPplevXvTo0QOAmTNnsmLFCmbPns3rr7+eqv3s2bP577//2Lp1K66urgCULl06VTsXFxcCAgKyNfbcYuhQ2LfPHBf03HPw00+Qxo9EREREbmKzAig+Pp5du3YxfPhw6z4nJyeCg4PZtm1bmud8//33BAUF0a9fP5YtW0aRIkXo2LEjw4YNw9nZ2dru4MGDFCtWDA8PD4KCgpgwYQKlSpW6bSxxcXHExcVZ30f//3oTCQkJJCQkpGqfvC+tY7YwYwbs3+/Cnj0WWrQw+PHH6+TLd+/95rY8s4uj5AmOk6vytC/K075kZ54Z6dNiGIaR5RGkw+nTpylevDhbt24lKCjIun/o0KFs3LiRHTt2pDqnUqVKHDt2jE6dOtG3b18OHTpE3759GTBgAKP+//HIq1atIiYmhvvvv58zZ84QFhbGqVOn+O2338ifP3+asYwePZqwsLBU++fPn4+Xl1cWZZy9/vnHgyFDGnLxogf1659iyJCfsVhsHZWIiEjOiY2NpWPHjly6dAkfH587ts1TBVDFihW5du0aR48etV7xmTx5Mu+++y5nzpxJ83MuXrzIfffdx+TJk+nZs2eabdK6AlSyZEnOnz+f5g8wISGB8PBwGjVqZL0Vlxts3WqhUSNnEhIsjBqVyJtv3tuiYbk1z6zmKHmC4+SqPO2L8rQv2ZlndHQ0fn5+6SqAbHYLzM/PD2dnZ86ePZti/9mzZ287ficwMBBXV9cUt7sqV65MVFQU8fHxuLm5pTrH19eXihUrcujQodvG4u7ujru7e6r9rq6ud/xy7nY8pzVsaN4Oe/FFCAtzpkYNZ1q1uvd+c1ue2cVR8gTHyVV52hflaV+yI8+M9GezWWBubm7UqlWLiIgI676kpCQiIiJSXBG6WYMGDTh06BBJNy2H/tdffxEYGJhm8QMQExPD4cOHCQwMzNoEcqmePeGVV8ztLl3gt99sG4+IiEhuZNNp8KGhocyaNYu5c+eyf/9++vTpw5UrV6yzwrp27ZpikHSfPn3477//ePXVV/nrr79YsWIF48ePp1+/ftY2gwcPZuPGjRw7doytW7fSunVrnJ2d6dChQ47nZyvvvQdPPgkxMdCiBfz7r60jEhERyV1sOg2+Xbt2/PPPP4wcOZKoqChq1KjB6tWr8ff3B+DEiRM4Od2o0UqWLMmaNWsYNGgQ1apVo3jx4rz66qsMGzbM2ubkyZN06NCBf//9lyJFivDII4+wfft2ihQpkuP52YqrK3z9tbmC/NGj0LYtrF5t7hcREREbF0AA/fv3p3///mkei4yMTLUvKCiI7du337a/hQsXZlVoeVrhwrBsGQQFwfr1MHAgfPghmhkmIiJCLlgKQ7LPgw/Cl1+aRc9HH8HUqbaOSEREJHdQAWTnWrWCiRPN7ddeg+++s2k4IiIiuYIKIAfw2mvQpw8YBnTqBHdZb1ZERMTuqQByABYLTJtmLp569So0b24OjhYREXFUKoAchIsLLFwINWrAuXNmMXThgq2jEhERsQ0VQA4kf35YvhxKlIA//4Rnn4X4eFtHJSIikvNUADmY4sVhxQqzGIqMNJfNsM1qcCIiIrajAsgBVasG33wDzs4wbx6Ehdk6IhERkZylAshBhYSYC6eCWQDNnWvbeERERHKSCiAH1qsXvP76je0NG2wbj4iISE5RAeTg3noL2rWDhARo3Rr++MPWEYmIiGQ/FUAOzskJ5syBBg3g0iVzenxUlK2jEhERyV4qgAQPD1i6FMqXh+PHoUULiI21dVQiIiLZRwWQAODnBytXmqvI//STuWRGYqKtoxIREckeKoDEqkIFWLYM3N3NK0LDhumPh4iI2Cf9DScpNGgAX3xhbk+b5szy5WVsG5CIiEg2UAEkqbRtC2+/bW7Pnv0g4eEW2wYkIiKSxVQASZqGDoUePZJISrLQpYszx4/bOiIREZGsowJI0mSxwPvvJ1K+/AX++8/Cc8/BtWu2jkpERCRrqACS2/LwgKFDf6JwYYOff4YBA2wdkYiISNZQASR3VLToVebNS8RigVmz4LPPbB2RiIjIvVMBJHcVHGwwbpy53a8f7Npl23hERETulQogSZfXX4fmzSEuDp57Dv7919YRiYiIZJ4KIEkXJyfz+UDlysGxY3pStIiI5G0qgCTdfH1hyRLw9IQ1a2DMGFtHJCIikjkqgCRDqlWDTz4xt8eMgeXLbRuPiIhIZqgAkgzr3NkcDA3QpQscOWLbeERERDJKBZBkyuTJUK8eXLwIzz4LsbG2jkhERCT9VABJpri5wTffQNGi8Msv0KcPGIatoxIREUkfFUCSaSVKwKJF4OxszhD7+GNbRyQiIpI+KoDknjz++I2V4wcMgB07bBqOiIhIuqgAknv22mvQpg0kJJgPSfznH1tHJCIicmcqgOSeWSwwezbcfz+cPAnt28P167aOSkRE5PZUAEmW8PExH5KYLx+sXw8jRtg6IhERkdtzsXUAYj8eeMC8EtSunTkuKD4eypc3Z4r5+9945c9vXjUSERGxFRVAkqXatoXt22HKFPNZQWnx8EhZFN1aID34IFSpkrNxi4iIY7H5LbDp06dTunRpPDw8qFu3Ljt37rxj+4sXL9KvXz8CAwNxd3enYsWKrFy58p76lKw1cSLMmAEvvQStWkFQkLmIqre3efzaNThxAn76yVxKY/ZsmDABBg6EDh3MAmjhQltmICIi9s6mV4AWLVpEaGgoM2fOpG7dukydOpWQkBAOHDhA0aJFU7WPj4+nUaNGFC1alMWLF1O8eHGOHz+Or69vpvuUrOfiAi+/nPax2Fg4e9Z8nTt3Yzv5degQ7NkDr7wCjRpB4cI5G7uIiDgGmxZAkydPplevXvTo0QOAmTNnsmLFCmbPns3rr7+eqv3s2bP577//2Lp1K66urgCULl36nvqUnOXlBWXKmK+0xMfDQw/B77/DkCHm1SEREZGsZrMCKD4+nl27djF8+HDrPicnJ4KDg9m2bVua53z//fcEBQXRr18/li1bRpEiRejYsSPDhg3D2dk5U30CxMXFERcXZ30fHR0NQEJCAgkJCanaJ+9L65g9sUWeFgvMmGGhYUNnPv/cQvv213niiexdY8NRvk9wnFyVp31RnvYlO/PMSJ82K4DOnz9PYmIi/v7+Kfb7+/vz559/pnnOkSNHWL9+PZ06dWLlypUcOnSIvn37kpCQwKhRozLVJ8CECRMICwtLtX/t2rV4eXnd9rzw8PA7pWg3bJFnkybVWLWqDD16XGPq1A24uSVl+2c6yvcJjpOr8rQvytO+ZEeesRlYmTtPzQJLSkqiaNGifPLJJzg7O1OrVi1OnTrFu+++y6hRozLd7/DhwwkNDbW+j46OpmTJkjRu3BgfH59U7RMSEggPD6dRo0bWW3H2yJZ5NmgA1aoZnD7tzZ49TQkLy74CyFG+T3CcXJWnfVGe9iU780y+g5MeNiuA/Pz8cHZ25uzZsyn2nz17loCAgDTPCQwMxNXVFWdnZ+u+ypUrExUVRXx8fKb6BHB3d8fd3T3VfldX1zt+OXc7bi9skaefH3z4obnExrvvOtOxozNVq2bvZzrK9wmOk6vytC/K075kR54Z6c9m0+Dd3NyoVasWERER1n1JSUlEREQQFBSU5jkNGjTg0KFDJCXduBrw119/ERgYiJubW6b6lNyrdWto2dJcVuOllyAp+++CiYiIg7Dpc4BCQ0OZNWsWc+fOZf/+/fTp04crV65YZ3B17do1xYDmPn368N9///Hqq6/y119/sWLFCsaPH0+/fv3S3afkHRYLfPCB+fygrVvhk09sHZGIiNgLm44BateuHf/88w8jR44kKiqKGjVqsHr1ausg5hMnTuDkdKNGK1myJGvWrGHQoEFUq1aN4sWL8+qrrzJs2LB09yl5S8mSMH48DBgAw4ZBixZQrJitoxIRkbzO5oOg+/fvT//+/dM8FhkZmWpfUFAQ27dvz3Sfkvf07Qtffgk7d5qF0OLFto5IRETyOpsvhSFyN87O5u0vZ2f49lv4/ntbRyQiInmdCiDJE6pXh8GDze1+/eDyZdvGIyIieZsKIMkzRo40l9A4eRJGjLB1NCIikpepAJI8w8sLZs40t6dNM8cEiYiIZIYKIMlTGjeGzp3BMKB3b7DzJXNERCSbqACSPGfyZChUCH75BaZOtXU0IiKSF6kAkjynSBF47z1ze9QoOHLEtvGIiEjeowJI8qRu3eCJJ+DqVejTx7wlJiIikl4qgCRPsljMAdHu7rB2LSxYYOuIREQkL1EBJHlWxYo3psMPHAj//WfTcEREJA9RASR52pAhUKUK/POPuS0iIpIeKoAkT3Nzu7FK/OzZsGGDbeMREZG8QQWQ5Hn165sDoQG6dIE//rBtPCIikvupABK7MGECVK4Mp07BI4/A1q22jkhERHIzFUBiFwoUgE2boF49uHABgoPhhx9sHZWIiORWKoDEbhQuDOvWQbNm5vOBWreGzz+3dVQiIpIbqQASu5IvH3z3nfmgxMREeOEF8/ZYVj8o0TBg0SIoWxYaNYKkpKztX0REspcKILE7rq7mlZ9hw8z3b7xhPicoq4qUQ4egSRNo3x6OHjWvOq1cmTV9i4hIzlABJHbJYoG334YpU8z306ZBp04QF5f5PuPiYNw4qFrVfPq0uzvUqWMemzz53mMWEZGcowJI7NrAgfDVV+DiAgsXwjPPwOXLGe9nwwaoXt188nRcnDnI+tdf4dtvzb43bIA9e7I8fBERySYqgMTudewIK1aY44PWrTMXUT13Ln3nnjsHXbvCk0/CgQPg7w/z55tXgCpUgBIloG1bs62uAomI5B0qgMQhNG5sXqXx84Ndu6BBAzhy5Pbtk5Jg1iyoVAnmzTNvqfXtC3/+CR06mO+ThYaa/1240HwOkYiI5H4qgMRhPPwwbNkCpUubA5nr14e9e1O327fPfJhi797mM4Vq1IDt22H6dPD1Td2+Vi1o2BCuX4cPP8zeHEREJGuoABKHUrGi+ZToatXg7Fl47DGIjDQv51y5Yi6o+tBDsG0beHubg6h/+unGYOfbSb4KNHMmxMRkcxIiInLPVACJwwkMhB9/NK/aXL4MzzzjzIIF91O9uguTJpnPD2rTBvbvNwdRu7jcvc9nnoHy5eHiRZg7N7szEBGRe6UCSBxSgQKwejU8+yzEx1tYtKgSJ05YKF0ali+HxYvNAc7p5eQEgwaZ21OmmEWUiIjkXiqAxGF5eMDXX0P//ol4eiYwZEgiv/9uLqWRGd26QcGCcPiw1iETEcntVACJQ3N2hsmTk5g/fyVvvZWEl1fm+8qXD/r0Mbc1JV5EJHdTASRCymnt96JfP3Mpjk2bzMHTIiKSO6kAEslCxYqZzwmCG8twiIhI7qMCSCSLJQ+G/vprOHHCtrGIiEjaVACJZLEaNcylMxIT4YMPbB2NiIikRQWQSDZIfjDiJ59kbvFVERHJXiqARLLB00+b64hFR8Ps2baORkREbqUCSCQb3PxgxKlT9WBEEZHcJlcUQNOnT6d06dJ4eHhQt25ddu7cedu2c+bMwWKxpHh5eHikaNO9e/dUbZo0aZLdaYik0KULFC4Mx47Bd9/ZOhoREbmZzQugRYsWERoayqhRo9i9ezfVq1cnJCSEc+fO3fYcHx8fzpw5Y30dP348VZsmTZqkaLNgwYLsTEMkFU9P6NvX3NaDEUVEchebF0CTJ0+mV69e9OjRgwceeICZM2fi5eXF7DsMnLBYLAQEBFhf/v7+qdq4u7unaFOwYMHsTEMkTX37gpububr8tm22jkZERJKlY53r7BMfH8+uXbsYPny4dZ+TkxPBwcFsu8PfFjExMdx3330kJSXx0EMPMX78eKpUqZKiTWRkJEWLFqVgwYI8+eSTjBs3jsKFC6fZX1xcHHFxcdb30dHRACQkJJCQkJCqffK+tI7ZE+V57woXhg4dnJk714n33ktiwQLbDgbSd2pflKd9UZ5Z13d6WAzDMLI8gnQ6ffo0xYsXZ+vWrQQFBVn3Dx06lI0bN7Jjx45U52zbto2DBw9SrVo1Ll26xKRJk/jxxx/5/fffKfH/y3cvXLgQLy8vypQpw+HDh3njjTfw9vZm27ZtODs7p+pz9OjRhIWFpdo/f/58vO5lcSgR4Pjx/Lz66pM4ORnMmBGOv/9VW4ckImKXYmNj6dixI5cuXcLHx+eObfNcAXSrhIQEKleuTIcOHRg7dmyabY4cOUK5cuVYt24dTz31VKrjaV0BKlmyJOfPn0/zB5iQkEB4eDiNGjXC1dU1PanmScoz6zRr5kx4uBMDBiQyaVJStnxGeug7tS/K074oz3sXHR2Nn59fugogm94C8/Pzw9nZmbNnz6bYf/bsWQICAtLVh6urKzVr1uTQoUO3bVO2bFn8/Pw4dOhQmgWQu7s77u7uafZ9py/nbsfthfK8d6+9BuHhMHu2M2PGOFOgQOb7unYNkpK4p5Xr9Z3aF+VpX5TnvfWZXjYdBO3m5katWrWIiIiw7ktKSiIiIiLFFaE7SUxM5NdffyUwMPC2bU6ePMm///57xzYi2alxY3jgAYiJgU8/zVwff/9tFlJFikDFinDLvxtERCQDbD4LLDQ0lFmzZjF37lz2799Pnz59uHLlCj169ACga9euKQZJjxkzhrVr13LkyBF2795N586dOX78OC+++CJgDpAeMmQI27dv59ixY0RERNCyZUvKly9PSEiITXIUsVhuLI/x/vtw/Xr6z/3tN+jWDcqWNafTx8TAqVMwYED2xCoi4ggyXADNnTuXFStWWN8PHToUX19f6tevn+bzeO6mXbt2TJo0iZEjR1KjRg327t3L6tWrrVPbT5w4wZkzZ6ztL1y4QK9evahcuTJNmzYlOjqarVu38sADDwDg7OzMvn37aNGiBRUrVqRnz57UqlWLTZs2pXmbSySndOoERYuaV3K+/fbObQ0DfvwRnnkGHnwQvvjCLJoefxymTwdnZ3O1+aVLcyJyERH7k+ExQOPHj2fGjBmAOSNr+vTpTJkyheXLlzNo0CCWLFmS4SD69+9P//790zwWGRmZ4v2UKVOYMmXKbfvy9PRkzZo1GY5BJLt5eEC/fjBqFLz3HrRta14ZulliIixbBhMnQvIcAIsF2rSBIUOgTh1z399/w9tvm88Zevxx8PXNyUxERPK+DF8B+vvvvylfvjwAS5cupU2bNvTu3ZsJEyawadOmLA9QxJ706QPu7vDTT7Bly439167BrFnmOKE2bczix90dXnoJDhyAb765UfwAjBxpjgM6cwYGD875PERE8roMF0De3t78+++/AKxdu5ZGjRoB4OHhwdWrer6JyJ0UKQJdu5rbkyfDxYswYQKULg29e8Nff5lXc958E44fh5kzoUKF1P14et4YTP3ZZ3DTPAIREUmHDBdAjRo14sUXX+TFF1/kr7/+omnTpgD8/vvvlC5dOqvjE7E7Awea/126FEqWhDfeMGd0lShhFkUnTsC4cZDGCi8pPPqoeUsNoFcvuHIlO6MWEbEvGS6Apk+fTlBQEP/88w/ffvutdXmJXbt20aFDhywPUMTePPAAPP20OdA5JgaqVjUHOR85AoMGQf786e9rwgSziDp6FEaMyL6YRUTsTYYHQfv6+vLhhx+m2p/WUhIikrYZM8yrPSEhZjF062Do9MqfHz7+GJo2halTzYHV9eplaagiInYpw1eAVq9ezebNm63vp0+fTo0aNejYsSMXLlzI0uBE7NV995nPA2raNPPFT7Knn4YuXcwrSj17wk2ruoiIyG1kuAAaMmSIdbX0X3/9lddee42mTZty9OhRQpOf9CYiOWrKFHOA9R9/mLfFRETkzjJcAB09etT60MFvv/2WZ555hvHjxzN9+nRWrVqV5QGKyN0VLgzJd6bfegt+/dW28YiI5HYZLoDc3NyIjY0FYN26dTRu3BiAQoUKWa8MiUjOe/55aNnSfGL0Cy9kbLkNERFHk+EC6JFHHiE0NJSxY8eyc+dOmjVrBsBff/1FiRIlsjxAEUkfiwU++ggKFICffzbHGImISNoyXAB9+OGHuLi4sHjxYmbMmEHx4sUBWLVqFU2aNMnyAEUk/YoVM5fZAHNa/KFDto1HRCS3yvA0+FKlSrF8+fJU+++0PpeI5JwXXoD582H9evMBievX3/tMMxERe5PhAgggMTGRpUuXsn//fgCqVKlCixYtcHZ2ztLgRCTjLBZzXbGqVSEy0lwyo1cvW0clIpK7ZPgW2KFDh6hcuTJdu3ZlyZIlLFmyhM6dO1OlShUOHz6cHTGKSAaVLWvOBgNzsdRTp2wbj4hIbpPhAmjAgAGUK1eOv//+m927d7N7925OnDhBmTJlGDBgQHbEKCKZMGCAuYJ8dLS5Cr1h2DoiEZHcI8MF0MaNG5k4cSKFChWy7itcuDBvv/02GzduzNLgRCTznJ3NleJdXeGHH2DRIltHJCKSe2S4AHJ3d+fy5cup9sfExODm5pYlQYlI1qhaFd5809x+5RU4f9628YiI5BYZLoCeeeYZevfuzY4dOzAMA8Mw2L59Oy+//DItWrTIjhhF5B4MH24WQufPw2uvaaKCiAhkYhbYtGnT6NatG0FBQbi6ugJw/fp1WrRowdSpU7M6PhG5R25u5q2woCBYsMCJ8+erc+6chdq1oUoV87iIiKPJcAHk6+vLsmXLOHTokHUafOXKlSlfvnyWByciWaNOHQgNhUmTIDy8NOHh5n5XV/PqUM2a8NBD5n+rV4d8+Wwbr4hIdsvUc4AAypcvn6Lo2bdvH7Vr1yY+Pj5LAhORrPXOO1Cr1nUWLDjGpUtl2bvXiUuXYM8e8zV7ttnOYoH77zeLoZsLo5vmPYiI5HmZLoBuZRgGiYmJWdWdiGQxJydo08bA0/N3mja9DxcXJ44dM4uf3btvFEJnzsCff5qvBQtunF+nDnz/Pfj72ywFEZEsk2UFkIjkLRYLlCljvp599sb+qKgbxVByYXTkCOzcCa1awYYN4OFhs7BFRLKECiARSSEgAJ5+2nwl278f6teH7duhZ0/48kutLyYieVu6C6Do6Og7Hk/r2UAiYh8qV4bFiyEkxFxotXJl+N//bB2ViEjmpbsA8vX1xXKHf/IZhnHH4yKStz31FEyfDi+/DCNGmAOln3/e1lGJiGROugugDRs2ZGccIpIHvPSSeTvs/fehWzdz/FDt2raOSkQk49JdADVs2DA74xCRPOK99+Cvv2DVKmjRwhwcXaKEraMSEcmYDC+FISKOzdkZFi40nyJ95oxZBF25YuuoREQyRgWQiGSYj4+5wryfnzlNvmtXSEqydVQiIumnAkhEMqVMGVi61FxLbMkSzQoTkbxFBZCIZFqDBvDpp+b2hAnwxRe2jUdEJL1UAInIPenSBYYPN7d79YLNm20bj4hIemT4SdCtW7dO83k/FosFDw8PypcvT8eOHbn//vuzJEARyf3GjYMDB8xbYa1bw08/QenSto5KROT2MnwFqECBAqxfv57du3djsViwWCzs2bOH9evXc/36dRYtWkT16tXZsmVLdsQrIrmQk5N5++uhh+D8eXjmGbjLw+NFRGwqwwVQQEAAHTt25MiRI3z77bd8++23HD58mM6dO1OuXDn2799Pt27dGDZsWHbEKyK5VL585mrxgYHw++/QoQMkJto6KhGRtGW4APrss88YOHAgTk43TnVycuKVV17hk08+wWKx0L9/f3777bd09zl9+nRKly6Nh4cHdevWZefOnbdtO2fOHOuVp+SXxy1LUxuGwciRIwkMDMTT05Pg4GAOHjyY0VRFJIOKFzeLIE9PWLkSBg+2dUQiImnLcAF0/fp1/vzzz1T7//zzTxL//597Hh4e6V4XbNGiRYSGhjJq1Ch2795N9erVCQkJ4dy5c7c9x8fHhzNnzlhfx48fT3F84sSJTJs2jZkzZ7Jjxw7y5ctHSEgI165dy0CmIpIZtWvfmA02dSp88olNwxERSVOGC6AuXbrQs2dPpkyZwubNm9m8eTNTpkyhZ8+edO3aFYCNGzdSpUqVdPU3efJkevXqRY8ePXjggQeYOXMmXl5ezJ49+7bnWCwWAgICrC9/f3/rMcMwmDp1Kv/73/9o2bIl1apV44svvuD06dMsXbo0o+mKSCY89xyMHWtu9+tnXg0SEclNMjwLbMqUKfj7+zNx4kTOnj0LgL+/P4MGDbKO+2ncuDFNmjS5a1/x8fHs2rWL4clzaDFvpwUHB7Nt27bbnhcTE8N9991HUlISDz30EOPHj7cWXEePHiUqKorg4GBr+wIFClC3bl22bdtG+/btU/UXFxdHXFyc9X30/4/eTEhIICEhIVX75H1pHbMnytP+5GSuQ4fCH384s2CBE888Y/DGG0m8+WYSLhn+rZNxjvKdKk/7ojyzru/0sBiGYWT2g5ILBR8fn0ydf/r0aYoXL87WrVsJCgqy7h86dCgbN25kx44dqc7Ztm0bBw8epFq1aly6dIlJkybx448/8vvvv1OiRAm2bt1KgwYNOH36NIGBgdbz2rZti8ViYdGiRan6HD16NGFhYan2z58/Hy8vr0zlJiKQkODExx9XY926+wCoUuU8gwbtws9Pt6NFJOvFxsbSsWNHLl26dNfa5J7+LZbZwudeBAUFpSiW6tevT+XKlfn4448Zm3zNPYOGDx9OaGio9X10dDQlS5akcePGaeaYkJBAeHg4jRo1wtXVNVOfmRcoT/tji1xbtoSFC6/Tt68zv//ux7Bhjfn000SaNcv0v73uylG+U+VpX5TnvYvOwPM3MlwAnT17lsGDBxMREcG5c+e49QJSYgbmvfr5+eHs7Gy9lXbzZwQEBKSrD1dXV2rWrMmhQ4cArOedPXs2xRWgs2fPUqNGjTT7cHd3x93dPc2+7/Tl3O24vVCe9ienc+3SBYKCoF072L3bQuvWLoSGmstnuLll3+c6yneqPO2L8ry3PtMrw4Ogu3fvzu7duxkxYgSLFy9myZIlKV4Z4ebmRq1atYiIiLDuS0pKIiIiIsVVnjtJTEzk119/tRY7ZcqUISAgIEWf0dHR7NixI919ikjWK18etm6FAQPM95MnwyOPwJEjto1LRBxThq8Abd68mU2bNt32akpGhYaG0q1bN2rXrk2dOnWYOnUqV65coUePHgB07dqV4sWLM2HCBADGjBlDvXr1KF++PBcvXuTdd9/l+PHjvPjii4A5Q2zgwIGMGzeOChUqUKZMGUaMGEGxYsVo1apVlsQsIpnj7g7vvw9PPQXdu5tLZtSsaU6Vb9fO1tGJiCPJcAFUsmTJVLe97kW7du34559/GDlyJFFRUdSoUYPVq1dbp7afOHEixUMXL1y4QK9evYiKiqJgwYLUqlWLrVu38sADD1jbDB06lCtXrtC7d28uXrzII488wurVq1M9MFFEbKNFC9i7Fzp2hC1boH17iIgwnxukeQcikhMyfAts6tSpvP766xw7dizLgujfvz/Hjx8nLi6OHTt2ULduXeuxyMhI5syZY30/ZcoUa9uoqChWrFhBzZo1U/RnsVgYM2YMUVFRXLt2jXXr1lGxYsUsi1dE7l2pUhAZCW+8ARYLzJoFderAH3/YOjIRcQQZLoDatWtHZGQk5cqVI3/+/BQqVCjFS0QkvVxc4K23YO1a8Pc31xCrXRtmz4YsvNAsIpJKhm+BTZ06NRvCEBFHFhxs3hLr2hXCw6FnT/OW2IwZYIOnbYiIA8hwAdStW7fsiENEHFxAAKxeDe+8AyNGwPz55qyxPn2gc2coVszWEYqIPUnXLbCbHywUHR19x5eISGY5OcHw4bBxI5QsCceOwbBh5nZIiFkUxcbaOkoRsQfpugJUsGBBzpw5Q9GiRfH19U1zpXfDMLBYLBl6EKKISFoaNDDHAy1cCHPnmjPF1q41X/nzw/PPm7fLHn3ULJpERDIqXQXQ+vXrrQOcN2zYkK0BiYiAWej06mW+Dh2CefPgiy/Mq0KzZ5uv0qXNp0x37Wo+aFFEJL3SVQA1bNgwzW0RkZxQvjyEhcGoUbB5s1kIff21WQyNHWu+6teHbt2gbVvIl8/WEYtIbpepxVAvXrzIzp07OXfuHElJSSmOde3aNUsCExG5lZMTPPaY+Zo2DZYtM2+RhYebA6aTl9po3tyZRx7Jb+twRSQXy3AB9MMPP9CpUydiYmLw8fFJMR7IYrGoABKRHOHlBR06mK/Tp+Grr8xi6PffYfFiJ9aubUDLluZtMhGRW2V4+OBrr73GCy+8QExMDBcvXuTChQvW13///ZcdMYqI3FGxYjBkCPz6K+zaBdWqGURHu9OlizPXr9s6OhHJjTJcAJ06dYoBAwbgpQV7RCSXsVjgoYdgwYLreHomsHmzE6NG2ToqEcmNMlwAhYSE8PPPP2dHLCIiWaJCBejXby8A48fDmjW2jUdEcp8MjwFq1qwZQ4YM4Y8//uDBBx/E1dU1xfEWLVpkWXAiIpn1yCOnuXw5kY8/dqZzZ3OpjeLFbR2ViOQWGS6AevXqBcCYMWNSHdODEEUkN3n33SR27HBm715zsPT69eYCrCIiGb4FlpSUdNuXih8RyU08PMznBeXPD5s2ofFAImKlh8iLiF2rUAFmzTK3NR5IRJKl62LwtGnT6N27Nx4eHkybNu2ObQcMGJAlgYmIZJV27cwFVmfMQOOBRARIZwE0ZcoUOnXqhIeHB1OmTLltO4vFogJIRHKlyZNh2zY0HkhEgHQWQEePHk1zW0Qkr/DwgG++MZ8TlDwe6K23bB2ViNiKxgCJiMMoXx4+/dTc1nggEceWqQvAJ0+e5Pvvv+fEiRPEx8enODZ58uQsCUxEJDu0bWuOB/roI40HEnFkGS6AIiIiaNGiBWXLluXPP/+katWqHDt2DMMweOihh7IjRhGRLPXee+Z4oD17NB5IxFFl+BbY8OHDGTx4ML/++iseHh58++23/P333zRs2JDnn38+O2IUEclSej6QiGS4ANq/fz9du3YFwMXFhatXr+Lt7c2YMWN45513sjxAEZHsoPFAIo4twwVQvnz5rON+AgMDOXz4sPXY+fPnsy4yEZFs1rYt9O1rbnfuDKdO2TYeEck5GS6A6tWrx+bNmwFo2rQpr732Gm+99RYvvPAC9erVy/IARUSy03vvQc2acP68OR7o+nVbRyQiOSHDw/4mT55MTEwMAGFhYcTExLBo0SIqVKigGWAikuckjwdKfj7Q8OHQvTtER8Ply+Yreft2/718GWJizCtKaawTLSK5UIYKoMTERE6ePEm1atUA83bYzJkzsyUwEZGckjweqF07mDTJfGXGuHHQtavZn4jkbhkqgJydnWncuDH79+/H19c3m0ISEcl5bdvC7t3wwQfg5WXOEPPxMf978/bt/jtpEmzYANOmmS8Ryd0yfAusatWqHDlyhDJlymRHPCIiNvP22+YrM1xdzQJo9mzzNpj+jSiSu2V4EPS4ceMYPHgwy5cv58yZM0RHR6d4iYg4ouBgqFIFrly5Mb1eRHKvdBdAY8aM4cqVKzRt2pRffvmFFi1aUKJECQoWLEjBggXx9fWlYMGC2RmriEiuZbHAwIHm9gcfaDaZSG6X7ltgYWFhvPzyy2zYsCE74xERybM6dTJnkZ04AUuWmOOKRCR3SncBZBgGAA0bNsy2YERE8jJPT+jTB8aOhalTVQCJ5GYZGgNksViyKw4REbvQty+4uZmLre7YYetoROR2MlQAVaxYkUKFCt3xlRnTp0+ndOnSeHh4ULduXXbu3Jmu8xYuXIjFYqFVq1Yp9nfv3h2LxZLi1aRJk0zFJiKSEQEB5hOlAaZMsW0sInJ7GZoGHxYWRoECBbI0gEWLFhEaGsrMmTOpW7cuU6dOJSQkhAMHDlC0aNHbnnfs2DEGDx7Mo48+mubxJk2a8Pnnn1vfu7u7Z2ncIiK3M2gQzJ0Lixeb44FKlbJ1RCJyqwwVQO3bt79jUZIZkydPplevXvTo0QOAmTNnsmLFCmbPns3rr7+e5jmJiYl06tSJsLAwNm3axMWLF1O1cXd3JyAgIEtjFRFJj+rV4YknzOcCffghTJxo64hE5FbpLoCyY/xPfHw8u3btYvjw4dZ9Tk5OBAcHs23bttueN2bMGIoWLUrPnj3ZtGlTmm0iIyMpWrQoBQsW5Mknn2TcuHEULlw4zbZxcXHExcVZ3yc/zyghIYGEhIRU7ZP3pXXMnihP++MoueaGPF95xcKGDS588onB8OHX8fbO+s/IDXnmBOVpX7Izz4z0aTGSp3fdhZOTE1FRUVl6Bej06dMUL16crVu3EhQUZN0/dOhQNm7cyI40RhBu3ryZ9u3bs3fvXvz8/OjevTsXL15k6dKl1jYLFy7Ey8uLMmXKcPjwYd544w28vb3Ztm0bzs7OqfocPXo0YWFhqfbPnz8fLy+vrElWRBxKUhL06/cUZ85406vXPpo1O2rrkETsXmxsLB07duTSpUv4+PjcsW26rwAlJSXdc2D36vLly3Tp0oVZs2bh5+d323bt27e3bj/44INUq1aNcuXKERkZyVNPPZWq/fDhwwkNDbW+j46OpmTJkjRu3DjNH2BCQgLh4eE0atQIV1fXe8wq91Ke9sdRcs0teZ444cTAgbBhw4N88EFlnDL87P07yy15ZjflaV+yM8+MrEiR4bXAspKfnx/Ozs6cPXs2xf6zZ8+mOX7n8OHDHDt2jObNm1v3JRdmLi4uHDhwgHLlyqU6r2zZsvj5+XHo0KE0CyB3d/c0B0m7urre8cu523F7oTztj6Pkaus8e/aE0aPh0CELa9e6ctOvrixl6zxzivK0L9mRZ0b6y+J/j2SMm5sbtWrVIiIiwrovKSmJiIiIFLfEklWqVIlff/2VvXv3Wl8tWrTgiSeeYO/evZQsWTLNzzl58iT//vsvgYGB2ZaLiMitvL2hVy9zW1PiRXIXm14BAggNDaVbt27Url2bOnXqMHXqVK5cuWKdFda1a1eKFy/OhAkT8PDwoGrVqinO9/3/JZeT98fExBAWFkabNm0ICAjg8OHDDB06lPLlyxMSEpKjuYmIvPIKTJ5szgjbuxdq1LB1RCICuaAAateuHf/88w8jR44kKiqKGjVqsHr1avz9/QE4ceIEThm4ce7s7My+ffuYO3cuFy9epFixYjRu3JixY8fqWUAikuNKloTnnoNFi8zlMebMsXVEIgK5oAAC6N+/P/3790/zWGRk5B3PnXPLbxNPT0/WrFmTRZGJiNy7QYPMAmjBAnj7bfNp0SJiWzYdAyQi4gjq1oWgIIiPh48+snU0IgIqgEREcsTAgeZ/Z8yAq1dtGoqIoAJIRCRHPPusuSbY+fPw1Ve2jkZEVACJiOQAFxdzRhiYg6HT9wx+EckuKoBERHLIiy9Cvnzw+++wbp2toxFxbCqARERyiK8vvPCCua0HI4rYlgogEZEc9OqrYLHAqlWwf7+toxFxXLniOUAiIo6iXDlo0QKWLYP334eZMzPXT3Q0fPONhQ0bKrJvnxOuruDklP6Xt7cZh4dH1uYnkleoABIRyWGDBpkF0BdfwFtvQeHC6TvPMGDzZvjsM/jmG4iNdQEqZzqO//0Pxo7N9OkieZoKIBGRHPbYY+aaYHv3wscfwxtv3Ln9mTMwdy7Mng0HD97YX7GiQalSJ/5/IWgnkpJI82UYKd//959ZSM2ZA2Fh5hUhEUejAkhEJIdZLOZVoG7d4MMPYfBgcHNL2SYhAVasMK/2rFoFiYnm/nz5oH17czB17drXWbVqL02bFsPVNf1VzLVrEBgIJ09CZCQ8+WTW5SaSV6juFxGxgfbtzTXBzpyBr7++sf/PP2HIEChRAlq3huXLzeKnQQOzGIqKgk8/hfr1zUIqMzw8oG1bc/uLL+49F5G8SAWQiIgNuLlBv37m9uTJZnHToAFUrgyTJsG5c+DvbxZD+/ebt6xeeMEcvJwVunY1//vtt3DlStb0KZKXqAASEbGRl182r8bs2WM+JHHrVnB2NmdnLV0Kf/8NEydCpUpZ/9n160PZshATY36WiKNRASQiYiN+ftC3r7ldsSK8/bZZ9CxbBi1bgqtr9n22xQJdupjb8+Zl3+eI5FYqgEREbGjiRDh2zBz7M2yYOTg5p3TubP43PNwciyTiSFQAiYjYkLMz3Hdf5gc034vy5c1bYUlJMH9+zn++iC2pABIRcWDJt8E0G0wcjQogEREH1ratOSNt3z745RdbRyOSc1QAiYg4sEKF4JlnzG0NhhZHogJIRMTBJT8T6Kuv4Pp128YiklNUAImIOLinnzYXZI2KgogIW0cjkjNUAImIODg3N3NpDtBtMHEcKoBERMQ6G+y77+DyZdvGIpITVACJiAh16phPo46NhSVLbB2NSPZTASQiIloaQxyOCiAREQFuLI2xfj2cPGnbWESymwogEREBoHRpeOwxMAxzSryIPVMBJCIiVsnPBPriC7MQErFXKoBERMTquefAwwP++AP27LF1NCLZRwWQiIhYFSgALVua21ogVeyZCiAREUkheTbYggWQkGDbWESyiwogERFJoXFjKFoUzp2DtWttHY1I9lABJCIiKbi6QocO5raeCST2SgWQiIikkjwbbOlSuHTJpqGIZAsVQCIikkrNmvDAAxAXB4sX2zoakayXKwqg6dOnU7p0aTw8PKhbty47d+5M13kLFy7EYrHQqlWrFPsNw2DkyJEEBgbi6elJcHAwBw8ezIbIRUTsk8WS8plAIvbG5gXQokWLCA0NZdSoUezevZvq1asTEhLCuXPn7njesWPHGDx4MI8++miqYxMnTmTatGnMnDmTHTt2kC9fPkJCQrh27Vp2pSEiYnc6dTILoR9/hGPHbB2NSNayeQE0efJkevXqRY8ePXjggQeYOXMmXl5ezJ49+7bnJCYm0qlTJ8LCwihbtmyKY4ZhMHXqVP73v//RsmVLqlWrxhdffMHp06dZunRpNmcjImI/SpSAJ580t7/80raxiGQ1F1t+eHx8PLt27WL48OHWfU5OTgQHB7Nt27bbnjdmzBiKFi1Kz5492bRpU4pjR48eJSoqiuDgYOu+AgUKULduXbZt20b79u1T9RcXF0dcXJz1fXR0NAAJCQkkpPEQjOR9aR2zJ8rT/jhKrsoz63ToYCEiwoUvvjAYOvQ6Fku2fdRt6fu0L9mZZ0b6tGkBdP78eRITE/H390+x39/fnz///DPNczZv3sxnn33G3r170zweFRVl7ePWPpOP3WrChAmEhYWl2r927Vq8vLxuG394ePhtj9kT5Wl/HCVX5Xnv8uVzwd09hIMHXXj//W1UrHgh2z7rbvR92pfsyDM2NjbdbW1aAGXU5cuX6dKlC7NmzcLPzy/L+h0+fDihoaHW99HR0ZQsWZLGjRvj4+OTqn1CQgLh4eE0atQIV1fXLIsjt1Ge9sdRclWeWev7751YsACOHm3AwIFJ2fY5t6Pv075kZ57Jd3DSw6YFkJ+fH87Ozpw9ezbF/rNnzxIQEJCq/eHDhzl27BjNmze37ktKMv9ndHFx4cCBA9bzzp49S2BgYIo+a9SokWYc7u7uuLu7p9rv6up6xy/nbsfthfK0P46Sq/LMGt27m8tiLFrkzNSpzri5ZbyPQ4dg82Zo3hwKF85cHPo+7Ut25JmR/mw6CNrNzY1atWoRERFh3ZeUlERERARBQUGp2leqVIlff/2VvXv3Wl8tWrTgiSeeYO/evZQsWZIyZcoQEBCQos/o6Gh27NiRZp8iInJnTz0FgYHw33+walX6zztyBN55B2rVggoVoEcPaNQIrlzJvlhF0svmt8BCQ0Pp1q0btWvXpk6dOkydOpUrV67Qo0cPALp27Urx4sWZMGECHh4eVK1aNcX5vr6+ACn2Dxw4kHHjxlGhQgXKlCnDiBEjKFasWKrnBYmIyN05O5tT4idNMp8JlLxafFqOH4dvvoGvv4affkrZh4cH7NkDnTvDt9+Ck83nIYsjs3kB1K5dO/755x9GjhxJVFQUNWrUYPXq1dZBzCdOnMApg/+XDB06lCtXrtC7d28uXrzII488wurVq/Hw8MiOFERE7F6XLmYBtHw5XLgABQveOPb33zeKnh07bux3coInnoC2baF1azh40JxWv3QpvPEGvP12jqchYmXzAgigf//+9O/fP81jkZGRdzx3zpw5qfZZLBbGjBnDmDFjsiA6ERGpVg2qV4dffjELnWeeuVH03PzUEosFHn/cLHqefdZcVT5ZkSLw2WfmFaB33oH77zdvi4nYQq4ogEREJPfr2hVeew0GD4aXX76x32KBRx81i542bSCNOSxWnTrBgQMwdiy89BKULQsNG2Z/7CK30h1YERFJl44dwcUFYmLM9488AtOmwcmTsHEj9Ot35+In2ejRZrGUkGBeJTp0KFvDFkmTrgCJiEi6BASYY4COHDGns5cokbl+nJxgzhw4etQcKP3MM+ZttJvHFYlkNxVAIiKSbiEhWdOPpycsWwZ16pi3xNq2hZUrwQEefyO5hG6BiYiITQQGwg8/QL58sG4dDBgAhmHrqMRRqAASERGbqVED5s83B1LPnAkffGDriMRRqAASERGbatECJk40twcNytjTpu9k2zbzqtLKlVnTn9gXFUAiImJzr70GL7wASUnQrh389lvm+rl+3Xw+UVAQ1K9vXlFq1szsMyoqa2OWvE0FkIiI2JzFAjNmmM8EunzZnBl27lz6z4+OhilToHx5c0D19u3g5mYO2nZ2Nh/YWLmy+SBGjTMSUAEkIiK5hJubuUZY+fLmmmKtWsG1a3c+59gxCA01p+SHhprn+fnByJHm9urV5lT7hx6CixfhxRfN5TgOHsyBhCRXUwEkIiK5RuHC5rOGfH3NMTwvvpj2FZtt2+D556FcOfPKz+XL5hWeTz6BEycgLOzGQxlr1jTXKJs0yZx+HxkJDz4IEyaYD2MUx6QCSEREcpX774fFi81bV199BRMmmH9V3Tq+Z/Fic8xQo0bmwOnffoNevcwi51YuLuY4o99+M9vHxZkLstauDTt35nCCkiuoABIRkVznqafgo4/M7dGjnfn442pUruySYnzPCy/Avn2wdi00aWI+YfpuypaFNWvgiy/Mq0379pkF1cCBN5b4EMegAkhERHKl3r3NwgRg1aoyHD9usY7vOXHCHND84IMZ79digS5dYP9+c2X6pCR4/32oUkVT5h2JCiAREcm1Jk2Cbt2SqFDhAjNmXLeO7/H3v/e+ixSBefPMgdKlS5tFVbNm5qKvGZmBJnmTCiAREcm1nJ1h1qxE3n33R3r2NNIc33OvQkLMsUGhoeZttAULzAHVH3wAmzebi7bGxWX954ptaTFUERFxePnywXvvQYcO5kDqvXvNp0jfzM8PihdP+SpRIuX7ggXNW2yS+6kAEhER+X/Js8I++ACWLoVTp8xXXBycP2++fvnl9ud7epqF0KBB0LdvjoUtmaACSERE5CaurubtsNBQ871hwH//3SiGkl8nT6Z8/++/cPUqHDpkntu8OZQsadtc5PZUAImIiNyBxWJOmS9cGKpVu327a9fg9Gno3h02bYKxY80HM0rupEHQIiIiWcDDw3zO0Ntvm+9nz4a//rJtTHJ7KoBERESyUP365mKuiYnmM4skd1IBJCIiksXeesu8dbZokTmjTHIfFUAiIiJZrFo1c0o9wJtv2jYWSZsKIBERkWwQFmYuwrpypflARcldVACJiIhkg/LloWdPc3v4cHM6veQeKoBERESyyYgR5uywzZvNNcck91ABJCIikk2KF4f+/c3tN980V56X3EEFkIiISDZ6/XXInx/27IHFi7O+/4MHoWJFeOGFrO/bnqkAEhERyUaFC8Pgweb2iBFw/XrW9X3+PDRtahZBn39urlwv6aMCSEREJJsNGmSuJv/XXzB3btb0ee0atGplrj2W7Msvs6ZvR6ACSEREJJvlzw9vvGFujx5tFi/3IikJevSALVugQAHzNhvAF19otll6qQASERHJAX36QIkS5iryM2feW18jRsDCheZzhpYsMQdYe3mZV4N27MiaeO2dCiAREZEc4OEBo0aZ22+9BZcvZ66f2bNh/Hhz+9NP4cknwdsb2rQx933xxb3H6ghUAImIiOSQ7t3NGVvnz8OUKRk/PzwcXnrJ3B4xArp1u3Gsa1fzvwsXQlzcPYdq93JFATR9+nRKly6Nh4cHdevWZefOnbdtu2TJEmrXro2vry/58uWjRo0azJs3L0Wb7t27Y7FYUryaNGmS3WmIiIjckYsLjB1rbk+aBP/+m/5zf/sNnnvOnEXWqZO51MbNnngCihWDCxdgxYqsi9le2bwAWrRoEaGhoYwaNYrdu3dTvXp1QkJCOHfuXJrtCxUqxJtvvsm2bdvYt28fPXr0oEePHqxZsyZFuyZNmnDmzBnra8GCBTmRjoiIyB099xzUqGHeAnv77fSdc+aMOd09Ohoeeww++8xcbf5mzs7QubO5fct1AUmDzQugyZMn06tXL3r06MEDDzzAzJkz8fLyYvbs2Wm2f/zxx2ndujWVK1emXLlyvPrqq1SrVo3Nt6w05+7uTkBAgPVVsGDBnEhHRETkjpycbozh+fBDOHXqzu2vXIHmzeHvv83bZ999B+7uabft0sX874oV5m02uT0XW354fHw8u3btYvjw4dZ9Tk5OBAcHs23btruebxgG69ev58CBA7zzzjspjkVGRlK0aFEKFizIk08+ybhx4yhcuHCa/cTFxRF30w3T6OhoABISEkhISEjVPnlfWsfsifK0P46Sq/K0L/aY51NPQYMGzmzZ4kRYWCLTpyelmWdiIrRr58yuXU74+RksW3ad/Pnhdj+K+++HmjVd2LPHwvz5ifTpk/vW3sjO7zMjfVoMw3ZPDDh9+jTFixdn69atBAUFWfcPHTqUjRs3suM2c/kuXbpE8eLFiYuLw9nZmY8++ogXbnoG+MKFC/Hy8qJMmTIcPnyYN954A29vb7Zt24azs3Oq/kaPHk3YrTdTgfnz5+Pl5ZUFmYqIiKT0xx+FeOONR3F2TuLDD9cTGHglVZtPP63K8uXlcHVNZOzYLVSqdOGu/X7/fVlmz36QChUu8O67P2ZH6LlWbGwsHTt25NKlS/j4+NyxbZ4sgJKSkjhy5AgxMTFEREQwduxYli5dyuOPP55m+yNHjlCuXDnWrVvHU089lep4WleASpYsyfnz59P8ASYkJBAeHk6jRo1wdXXNYNZ5h/K0P46Sq/K0L/acZ8uWzqxa5UT79kl89tm1FHl++KEToaHmP9rnz7/Oc8+l76/rs2ehdGkXEhMt/PprAvffn50ZZFx2fp/R0dH4+fmlqwCy6S0wPz8/nJ2dOXv2bIr9Z8+eJSAg4LbnOTk5Ub58eQBq1KjB/v37mTBhwm0LoLJly+Ln58ehQ4fSLIDc3d1xT+OGqqur6x2/nLsdtxfK0/44Sq7K077YY55vvQWrVsGiRU4MHmzm5urqyqpVrrz2mtnmnXegQ4f0/3VdogQ0aWKOA1q40JVx47Ij8nuXHd9nRvqz6SBoNzc3atWqRUREhHVfUlISERERKa4I3U1SUlKKKzi3OnnyJP/++y+BgYH3FK+IiEhWqlkT2rUzl68YOdK82rN7N3ToYO7r3RuGDMl4v8nPBJo3z1w2Q1Kz+Syw0NBQZs2axdy5c9m/fz99+vThypUr9OjRA4CuXbumGCQ9YcIEwsPDOXLkCPv37+e9995j3rx5dP7/uX8xMTEMGTKE7du3c+zYMSIiImjZsiXly5cnJCTEJjmKiIjczpgx5hT2lSud2LSpOK1auRAbCyEhMH166unu6dG8Ofj4wIkT8KNjDQNKN5veAgNo164d//zzDyNHjiQqKooaNWqwevVq/P39AThx4gROTjfqtCtXrtC3b19OnjyJp6cnlSpV4ssvv6Rdu3YAODs7s2/fPubOncvFixcpVqwYjRs3ZuzYsWne5hIREbGlihXNhU0//RTee682AA8+CF9/bT44MTM8PaFtW7PPL76A24wQcWg2L4AA+vfvT//+/dM8FhkZmeL9uHHjGHeHG5qenp6pHoooIiKSm40cCfPmGcTFWShWzGDFCgt3GcN7V127mgXQ4sXm84Y0qTklm98CExERcXQlS8K4cUncd98lvvvuOiVL3nufDRpAmTLmE6eXLbv3/uyNCiAREZFc4NVXk3j//Uhq1sya/pycbiyNoRXiU1MBJCIiYqeSl8ZYu9ZcT0xuUAEkIiJipypUgKAgcyq81gRPSQWQiIiIHUt+JpBug6WkAkhERMSOtW0Lbm7wyy/mS0wqgEREROxYoULwzDPm9rx5to0lN1EBJCIiYueSb4N99RVcv27bWHILFUAiIiJ27umnoXBhiIqCm5bfdGgqgEREROycmxu0b29uZ8Vg6FOnzKKqQweIjr73/mxBBZCIiIgDSL4N9t135tOhM2vfPqhbF1avhoUL4dFHzYIor1EBJCIi4gAefhjuvx+uXoVvv81cH2vWwCOPmAXP/feDv79ZENWrB7/+mrXxZjcVQCIiIg7AYrm3ZwJ9+ik0a2ZePXr8cdi2DbZvh0qV4ORJszBavz5LQ85WKoBEREQcRKdO5n83bIDjx9N3TlISvPkm9OoFiYnm+mKrV0PBglC6NGzdCo89Zo4FatIEvvwy28LPUiqAREREHMR995lXb8CcEn83cXFmwTN+vPl+5Ejz6pG7+402BQuat8batYOEBHP9sfHjwTCyPPwspQJIRETEgSTfBps3785Fyr//QqNG5hpiLi7w+ecQFmbeSruVhwfMnw9Dhpjv33wTXn45dz9zSAWQiIiIA2nTBjw94c8/4eef025z+DDUrw+bNoGPj3nLq3v3O/fr5AQTJ8KHH5rbn3wCLVtCTEyWp5AlVACJiIg4EB8faNXK3E5rMPS2beasrr/+glKlYMsWeOqp9Pffrx8sWWIWWStXmrfcoqKyIvKspQJIRETEwSTfBluwAOLjb+z/9lt48kk4fx4eesic5VW1asb7b9nSHGjt5we7dkFQkHnFKTdRASQiIuJggoMhIMAc57N6tTkW6L334Pnn4do1c/HUjRshMDDzn1G3rnk1qXx5OHbsxi213EIFkIiIiINxcbkxJX72bOjfHwYPNguhfv1g6VLw9r73zylf3pwmX68eXLhgDqr+5ps0RlHbgAogERERB9Sli/nfZcvgo4/M2V2TJ8MHH4Czc9Z9TpEi5gKsrVub0+o7dXJh6dJyNp8mrwJIRETEAVWvDtWqmdseHrB4MQwalPY093vl5QXffAMDBpjv58ypyrBhti1BVACJiIg4qIkTzac3b9gAzz6bvZ/l7AxTp8LEiYlYLAYPPmjbS0AuNv10ERERsZmQEPOVUywWGDgwCS+vSLp0eSznPjgNugIkIiIiOapECds/HVEFkIiIiDgcFUAiIiLicFQAiYiIiMNRASQiIiIORwWQiIiIOBwVQCIiIuJwVACJiIiIw1EBJCIiIg5HBZCIiIg4HBVAIiIi4nBUAImIiIjDUQEkIiIiDkcFkIiIiDgcF1sHkBsZhgFAdHR0mscTEhKIjY0lOjoaV1fXnAwtRylP++MouSpP+6I87Ut25pn893by3+N3ogIoDZcvXwagZMmSNo5EREREMury5csUKFDgjm0sRnrKJAeTlJTE6dOnyZ8/PxaLJdXx6OhoSpYsyd9//42Pj48NIswZytP+OEquytO+KE/7kp15GobB5cuXKVasGE5Odx7loytAaXBycqJEiRJ3befj42PXf0iTKU/74yi5Kk/7ojztS3blebcrP8k0CFpEREQcjgogERERcTgqgDLB3d2dUaNG4e7ubutQspXytD+OkqvytC/K077kljw1CFpEREQcjq4AiYiIiMNRASQiIiIORwWQiIiIOBwVQCIiIuJwVABlwvTp0yldujQeHh7UrVuXnTt32jqkTJswYQIPP/ww+fPnp2jRorRq1YoDBw6kaHPt2jX69etH4cKF8fb2pk2bNpw9e9ZGEWeNt99+G4vFwsCBA6377CnPU6dO0blzZwoXLoynpycPPvggP//8s/W4YRiMHDmSwMBAPD09CQ4O5uDBgzaMOOMSExMZMWIEZcqUwdPTk3LlyjF27NgUawDlxTx//PFHmjdvTrFixbBYLCxdujTF8fTk9N9//9GpUyd8fHzw9fWlZ8+exMTE5GAWd3enPBMSEhg2bBgPPvgg+fLlo1ixYnTt2pXTp0+n6CMv5Al3/05v9vLLL2OxWJg6dWqK/Xkh1/TkuX//flq0aEGBAgXIly8fDz/8MCdOnLAez8nfwyqAMmjRokWEhoYyatQodu/eTfXq1QkJCeHcuXO2Di1TNm7cSL9+/di+fTvh4eEkJCTQuHFjrly5Ym0zaNAgfvjhB7755hs2btzI6dOnefbZZ20Y9b356aef+Pjjj6lWrVqK/faS54ULF2jQoAGurq6sWrWKP/74g/fee4+CBQta20ycOJFp06Yxc+ZMduzYQb58+QgJCeHatWs2jDxj3nnnHWbMmMGHH37I/v37eeedd5g4cSIffPCBtU1ezPPKlStUr16d6dOnp3k8PTl16tSJ33//nfDwcJYvX86PP/5I7969cyqFdLlTnrGxsezevZsRI0awe/dulixZwoEDB2jRokWKdnkhT7j7d5rsu+++Y/v27RQrVizVsbyQ693yPHz4MI888giVKlUiMjKSffv2MWLECDw8PKxtcvT3sCEZUqdOHaNfv37W94mJiUaxYsWMCRMm2DCqrHPu3DkDMDZu3GgYhmFcvHjRcHV1Nb755htrm/379xuAsW3bNluFmWmXL182KlSoYISHhxsNGzY0Xn31VcMw7CvPYcOGGY888shtjyclJRkBAQHGu+++a9138eJFw93d3ViwYEFOhJglmjVrZrzwwgsp9j377LNGp06dDMOwjzwB47vvvrO+T09Of/zxhwEYP/30k7XNqlWrDIvFYpw6dSrHYs+IW/NMy86dOw3AOH78uGEYeTNPw7h9ridPnjSKFy9u/Pbbb8Z9991nTJkyxXosL+aaVp7t2rUzOnfufNtzcvr3sK4AZUB8fDy7du0iODjYus/JyYng4GC2bdtmw8iyzqVLlwAoVKgQALt27SIhISFFzpUqVaJUqVJ5Mud+/frRrFmzFPmAfeX5/fffU7t2bZ5//nmKFi1KzZo1mTVrlvX40aNHiYqKSpFrgQIFqFu3bp7KtX79+kRERPDXX38B8Msvv7B582aefvppwH7yvFl6ctq2bRu+vr7Url3b2iY4OBgnJyd27NiR4zFnlUuXLmGxWPD19QXsK8+kpCS6dOnCkCFDqFKlSqrj9pBrUlISK1asoGLFioSEhFC0aFHq1q2b4jZZTv8eVgGUAefPnycxMRF/f/8U+/39/YmKirJRVFknKSmJgQMH0qBBA6pWrQpAVFQUbm5u1l86yfJizgsXLmT37t1MmDAh1TF7yvPIkSPMmDGDChUqsGbNGvr06cOAAQOYO3cugDWfvP7n+PXXX6d9+/ZUqlQJV1dXatasycCBA+nUqRNgP3neLD05RUVFUbRo0RTHXVxcKFSoUJ7N+9q1awwbNowOHTpYF8+0pzzfeecdXFxcGDBgQJrH7SHXc+fOERMTw9tvv02TJk1Yu3YtrVu35tlnn2Xjxo1Azv8e1mrwYtWvXz9+++03Nm/ebOtQstzff//Nq6++Snh4eIr7zfYoKSmJ2rVrM378eABq1qzJb7/9xsyZM+nWrZuNo8s6X3/9NV999RXz58+nSpUq7N27l4EDB1KsWDG7ytPRJSQk0LZtWwzDYMaMGbYOJ8vt2rWL999/n927d2OxWGwdTrZJSkoCoGXLlgwaNAiAGjVqsHXrVmbOnEnDhg1zPCZdAcoAPz8/nJ2dU41IP3v2LAEBATaKKmv079+f5cuXs2HDBkqUKGHdHxAQQHx8PBcvXkzRPq/lvGvXLs6dO8dDDz2Ei4sLLi4ubNy4kWnTpuHi4oK/v79d5AkQGBjIAw88kGJf5cqVrTMtkvPJ63+OhwwZYr0K9OCDD9KlSxcGDRpkvcJnL3neLD05BQQEpJqUcf36df777788l3dy8XP8+HHCw8OtV3/AfvLctGkT586do1SpUtbfTcePH+e1116jdOnSgH3k6ufnh4uLy11/N+Xk72EVQBng5uZGrVq1iIiIsO5LSkoiIiKCoKAgG0aWeYZh0L9/f7777jvWr19PmTJlUhyvVasWrq6uKXI+cOAAJ06cyFM5P/XUU/z666/s3bvX+qpduzadOnWybttDngANGjRI9SiDv/76i/vuuw+AMmXKEBAQkCLX6OhoduzYkadyjY2Nxckp5a8wZ2dn67807SXPm6Unp6CgIC5evMiuXbusbdavX09SUhJ169bN8ZgzK7n4OXjwIOvWraNw4cIpjttLnl26dGHfvn0pfjcVK1aMIUOGsGbNGsA+cnVzc+Phhx++4++mHP/7JsuHVdu5hQsXGu7u7sacOXOMP/74w+jdu7fh6+trREVF2Tq0TOnTp49RoEABIzIy0jhz5oz1FRsba23z8ssvG6VKlTLWr19v/Pzzz0ZQUJARFBRkw6izxs2zwAzDfvLcuXOn4eLiYrz11lvGwYMHja+++srw8vIyvvzyS2ubt99+2/D19TWWLVtm7Nu3z2jZsqVRpkwZ4+rVqzaMPGO6detmFC9e3Fi+fLlx9OhRY8mSJYafn58xdOhQa5u8mOfly5eNPXv2GHv27DEAY/LkycaePXuss5/Sk1OTJk2MmjVrGjt27DA2b95sVKhQwejQoYOtUkrTnfKMj483WrRoYZQoUcLYu3dvit9NcXFx1j7yQp6Gcffv9Fa3zgIzjLyR693yXLJkieHq6mp88sknxsGDB40PPvjAcHZ2NjZt2mTtIyd/D6sAyoQPPvjAKFWqlOHm5mbUqVPH2L59u61DyjQgzdfnn39ubXP16lWjb9++RsGCBQ0vLy+jdevWxpkzZ2wXdBa5tQCypzx/+OEHo2rVqoa7u7tRqVIl45NPPklxPCkpyRgxYoTh7+9vuLu7G0899ZRx4MABG0WbOdHR0carr75qlCpVyvDw8DDKli1rvPnmmyn+gsyLeW7YsCHN/ye7detmGEb6cvr333+NDh06GN7e3oaPj4/Ro0cP4/LlyzbI5vbulOfRo0dv+7tpw4YN1j7yQp6Gcffv9FZpFUB5Idf05PnZZ58Z5cuXNzw8PIzq1asbS5cuTdFHTv4ethjGTY9NFREREXEAGgMkIiIiDkcFkIiIiDgcFUAiIiLicFQAiYiIiMNRASQiIiIORwWQiIiIOBwVQCIiIuJwVACJSK5UunRppk6dmu72kZGRWCyWVOsIiYikRQWQiNwTi8Vyx9fo0aMz1e9PP/1E7969092+fv36nDlzhgIFCmTq8zJi1qxZVK9eHW9vb3x9falZs6Z1EVaA7t2706pVq2yPQ0Qyz8XWAYhI3nbmzBnr9qJFixg5cmSKBQ+9vb2t24ZhkJiYiIvL3X/1FClSJENxuLm55cjK2LNnz2bgwIFMmzaNhg0bEhcXx759+/jtt9+y/bNFJOvoCpCI3JOAgADrq0CBAlgsFuv7P//8k/z587Nq1Spq1aqFu7s7mzdv5vDhw7Rs2RJ/f3+8vb15+OGHWbduXYp+b70FZrFY+PTTT2ndujVeXl5UqFCB77//3nr81ltgc+bMwdfXlzVr1lC5cmW8vb1p0qRJioLt+vXrDBgwAF9fXwoXLsywYcPo1q3bHa/efP/997Rt25aePXtSvnx5qlSpQocOHXjrrbcAGD16NHPnzmXZsmXWq2CRkZEA/P3337Rt2xZfX18KFSpEy5YtOXbsmLXv5CtHYWFhFClSBB8fH15++WXi4+Mz9+WIyG2pABKRbPf666/z9ttvs3//fqpVq0ZMTAxNmzYlIiKCPXv20KRJE5o3b86JEyfu2E9YWBht27Zl3759NG3alE6dOvHff//dtn1sbCyTJk1i3rx5/Pjjj5w4cYLBgwdbj7/zzjt89dVXfP7552zZsoXo6GiWLl16xxgCAgLYvn07x48fT/P44MGDadu2rbXYOnPmDPXr1ychIYGQkBDy58/Ppk2b2LJli7Uou7nAiYiIYP/+/URGRrJgwQKWLFlCWFjYHWMSkUzIliVWRcQhff7550aBAgWs75NXh751xee0VKlSxfjggw+s729dERsw/ve//1nfx8TEGICxatWqFJ914cIFayyAcejQIes506dPN/z9/a3v/f39jXfffdf6/vr160apUqWMli1b3jbO06dPG/Xq1TMAo2LFika3bt2MRYsWGYmJidY23bp1S9XHvHnzjPvvv99ISkqy7ouLizM8PT2NNWvWWM8rVKiQceXKFWubGTNmGN7e3in6F5F7pytAIpLtateuneJ9TEwMgwcPpnLlyvj6+uLt7c3+/fvvegWoWrVq1u18+fLh4+PDuXPnbtvey8uLcuXKWd8HBgZa21+6dImzZ89Sp04d63FnZ2dq1ap1xxgCAwPZtm0bv/76K6+++irXr1+nW7duNGnShKSkpNue98svv3Do0CHy58+Pt7c33t7eFCpUiGvXrnH48GFru+rVq+Pl5WV9HxQURExMDH///fcd4xKRjNEgaBHJdvny5UvxfvDgwYSHhzNp0iTKly+Pp6cnzz333F3Huri6uqZ4b7FY7lh0pNXeMIwMRp+2qlWrUrVqVfr27cvLL7/Mo48+ysaNG3niiSfSbB8TE0OtWrX46quvUh3L6IBvEbl3KoBEJMdt2bKF7t2707p1a8AsDm4eDJwTChQogL+/Pz/99BOPPfYYAImJiezevZsaNWpkqK8HHngAgCtXrgDmjLTExMQUbR566CEWLVpE0aJF8fHxuW1fv/zyC1evXsXT0xOA7du34+3tTcmSJTMUk4jcmW6BiUiOq1ChAkuWLGHv3r388ssvdOzY8Y5XcrLLK6+8woQJE1i2bBkHDhzg1Vdf5cKFC1gsltue06dPH8aOHcuWLVs4fvw427dvp2vXrhQpUoSgoCDAnMG2b98+Dhw4wPnz50lISKBTp074+fnRsmVLNm3axNGjR4mMjGTAgAGcPHnS2n98fDw9e/bkjz/+YOXKlYwaNYr+/fvj5KRf1yJZSf9HiUiOmzx5MgULFqR+/fo0b96ckJAQHnrooRyPY9iwYXTo0IGuXbsSFBSEt7c3ISEheHh43Pac4OBgtm/fzvPPP0/FihVp06YNHh4eREREULhwYQB69erF/fffT+3atSlSpAhbtmzBy8uLH3/8kVKlSvHss89SuXJlevbsybVr11JcEXrqqaeoUKECjz32GO3ataNFixaZfpikiNyexciqG+IiInlcUlISlStXpm3btowdOzbHP7979+5cvHjxrlPxReTeaQyQiDis48ePs3btWusTnT/88EOOHj1Kx44dbR2aiGQz3QITEYfl5OTEnDlzePjhh2nQoAG//vor69ato3LlyrYOTUSymW6BiYiIiMPRFSARERFxOCqARERExOGoABIRERGHowJIREREHI4KIBEREXE4KoBERETE4agAEhEREYejAkhEREQcjgogERERcTj/By+XilMBOAnPAAAAAElFTkSuQmCC",
            "text/plain": [
              "<Figure size 640x480 with 1 Axes>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "# Plot training loss\n",
        "import matplotlib.pyplot as plt\n",
        "batches = list(range(5, 5 * (len(all_losses) + 1), 5))\n",
        "plt.plot(batches, all_losses, linestyle='-', color='b')\n",
        "plt.title('Training Loss over Batches')\n",
        "plt.xlabel('Training Step')\n",
        "plt.ylabel('Training Loss')\n",
        "plt.grid(True)\n",
        "plt.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PZno63YxJ-LD"
      },
      "source": [
        "### Evaluation and Prediction\n",
        "\n",
        "After training or fine-tuning, evaluate ProtBERT's performance using metrics like accuracy.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "InFHLFFHvmDW",
        "outputId": "07ba6f7e-9cb5-4290-b8ea-f3967d8eab6e"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'accuracy_score': 0.782}"
            ]
          },
          "execution_count": 24,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "classification_metric = dc.metrics.Metric(dc.metrics.accuracy_score)\n",
        "eval_score = ProtBERTmodel_for_classification.evaluate(deeploc_test_dataset, [classification_metric],n_classes=2)\n",
        "eval_score"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The results demonstrate the fine-tuned model's accuracy in predicting soluble versus non-soluble proteins. Notably, training with the complete dataset yields an accuracy of around 0.87 [1].\n",
        "\n",
        "[1] Elnaggar, Ahmed, et al. \"ProtTrans: Toward understanding the language of life through self-supervised learning.\" IEEE Transactions on Pattern Analysis and Machine Intelligence 44.10 (2021): 7112-7127."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3BeDseMEBvqI"
      },
      "source": [
        "# Congratulations! Time to join the Community!\n",
        "\n",
        "Congratulations on completing this tutorial notebook! If you enjoyed working through the tutorial, and want to continue working with DeepChem, we encourage you to finish the rest of the tutorials in this series. You can also help the DeepChem community in the following ways:\n",
        "\n",
        "## Star DeepChem on [GitHub](https://github.com/deepchem/deepchem)\n",
        "This helps build awareness of the DeepChem project and the tools for open source drug discovery that we're trying to build.\n",
        "\n",
        "## Join the DeepChem Gitter\n",
        "The DeepChem [Gitter](https://gitter.im/deepchem/Lobby) hosts a number of scientists, developers, and enthusiasts interested in deep learning for the life sciences. Join the conversation!"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.6"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
